You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Baswaraj Kasture <kb...@gmail.com> on 2016/08/20 10:33:14 UTC

How to share text file across tasks at run time in flink.

Am running Flink standalone cluster.

I have text file that need to be shared across tasks when i submit my
application.
in other words , put this text file in class path of running tasks.

How can we achieve this with flink ?

In spark, spark-submit has --jars option that puts all the files specified
in class path of executors (executors run in separate JVM and spawned
dynamically, so it is possible).

Flink's task managers run tasks in separate thread under taskmanager JVM
(?) , how can we make this text file to be accessible on all tasks spawned
by current application ?

Using HDFS, NFS or including file in program jar is one way that i know,
but am looking for solution that can allows me to provide text file at run
time and still accessible in all tasks.
Thanks.

Re: How to share text file across tasks at run time in flink.

Posted by Robert Metzger <rm...@apache.org>.
Hi,

you could use Zookeeper if you want to dynamically change the DB name /
credentials at runtime.

The GlobalJobParameters are immutable at runtime, so you can not pass
updates through it to the cluster.
They are intended for parameters for all operators/the entire job and the
web interface.

Regards,
Robert


On Thu, Aug 25, 2016 at 7:03 AM, Baswaraj Kasture <kb...@gmail.com>
wrote:

> Thanks to all for your inputs.
> Yeah, I could put all these common configurations/rules in DB and workers
> can pick it up dynamically at run time.
> In this case DB configuration/connection details need to be hard coded  ?
> Is there any way worker can pickup  DB name/credentials etc at run time
> dynamically ?
>
> I am going through the feature/API documentation, but how about using
> function closer  and setGlobalJobParameters/getGlobalJobParameters ?
>
> +Baswaraj
>
> On Wed, Aug 24, 2016 at 5:17 PM, Maximilian Michels <mx...@apache.org>
> wrote:
>
>> Hi!
>>
>> 1. The community is working on adding side inputs to the DataStream
>> API. That will allow you to easily distribute data to all of your
>> workers.
>>
>> 2. In the meantime, you could use `.broadcast()` on a DataSet to
>> broadcast data to all workers. You still have to join that data with
>> another stream though.
>>
>> 3. The easiest method of all is to simply load your file in the
>> RichMapFunction's open() method. The file can reside in a distributed
>> file system which is accessible by all workers.
>>
>> Cheers,
>> Max
>>
>> On Wed, Aug 24, 2016 at 6:45 AM, Jark Wu <wu...@alibaba-inc.com>
>> wrote:
>> > Hi,
>> >
>> > I think what Bswaraj want is excatly something like Storm Distributed
>> Cache
>> > API[1] (if I’m not misunderstanding).
>> >
>> > The distributed cache feature in storm is used to efficiently distribute
>> > files (or blobs, which is the equivalent terminology for a file in the
>> > distributed cache and is used interchangeably in this document) that are
>> > large and can change during the lifetime of a topology, such as
>> geo-location
>> > data, dictionaries, etc. Typical use cases include phrase recognition,
>> > entity extraction, document classification, URL re-writing,
>> location/address
>> > detection and so forth. Such files may be several KB to several GB in
>> size.
>> > For small datasets that don't need dynamic updates, including them in
>> the
>> > topology jar could be fine. But for large files, the startup times could
>> > become very large. In these cases, the distributed cache feature can
>> provide
>> > fast topology startup, especially if the files were previously
>> downloaded
>> > for the same submitter and are still in the cache. This is useful with
>> > frequent deployments, sometimes few times a day with updated jars,
>> because
>> > the large cached files will remain available without changes. The large
>> > cached blobs that do not change frequently will remain available in the
>> > distributed cache.
>> >
>> >
>> > We can look into this whether it is a common use case and how to
>> implement
>> > it in Flink.
>> >
>> > [1] http://storm.apache.org/releases/2.0.0-SNAPSHOT/distcache-
>> blobstore.html
>> >
>> >
>> > - Jark Wu
>> >
>> > 在 2016年8月23日,下午9:45,Lohith Samaga M <Lo...@mphasis.com> 写道:
>> >
>> > Hi
>> > May be you could use Cassandra to store and fetch all such reference
>> data.
>> > This way the reference data can be updated without restarting your
>> > application.
>> >
>> > Lohith
>> >
>> > Sent from my Sony Xperia™ smartphone
>> >
>> >
>> >
>> > ---- Baswaraj Kasture wrote ----
>> >
>> > Thanks Kostas !
>> > I am using DataStream API.
>> >
>> > I have few config/property files (key vale text file) and also have
>> business
>> > rule files (json).
>> > These rules and configurations are needed when we process incoming
>> event.
>> > Is there any way to share them to task nodes from driver program ?
>> > I think this is very common use case and am sure other users may face
>> > similar issues.
>> >
>> > +Baswaraj
>> >
>> > On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas
>> > <k....@data-artisans.com> wrote:
>> >>
>> >> Hello Baswaraj,
>> >>
>> >> Are you using the DataSet (batch) or the DataStream API?
>> >>
>> >> If you are in the first, you can use a broadcast variable for your
>> task.
>> >> If you are using the DataStream one, then there is no proper support
>> for
>> >> that.
>> >>
>> >> Thanks,
>> >> Kostas
>> >>
>> >> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kb...@gmail.com>
>> >> wrote:
>> >>
>> >> Am running Flink standalone cluster.
>> >>
>> >> I have text file that need to be shared across tasks when i submit my
>> >> application.
>> >> in other words , put this text file in class path of running tasks.
>> >>
>> >> How can we achieve this with flink ?
>> >>
>> >> In spark, spark-submit has --jars option that puts all the files
>> specified
>> >> in class path of executors (executors run in separate JVM and spawned
>> >> dynamically, so it is possible).
>> >>
>> >> Flink's task managers run tasks in separate thread under taskmanager
>> JVM
>> >> (?) , how can we make this text file to be accessible on all tasks
>> spawned
>> >> by current application ?
>> >>
>> >> Using HDFS, NFS or including file in program jar is one way that i
>> know,
>> >> but am looking for solution that can allows me to provide text file at
>> run
>> >> time and still accessible in all tasks.
>> >> Thanks.
>> >>
>> >>
>> >
>> >
>> > Information transmitted by this e-mail is proprietary to Mphasis, its
>> > associated companies and/ or its customers and is intended
>> > for use only by the individual or entity to which it is addressed, and
>> may
>> > contain information that is privileged, confidential or
>> > exempt from disclosure under applicable law. If you are not the intended
>> > recipient or it appears that this mail has been forwarded
>> > to you without proper authority, you are notified that any use or
>> > dissemination of this information in any manner is strictly
>> > prohibited. In such cases, please notify us immediately at
>> > mailmaster@mphasis.com and delete this mail from your records.
>> >
>> >
>>
>
>

Re: How to share text file across tasks at run time in flink.

Posted by Baswaraj Kasture <kb...@gmail.com>.
Thanks to all for your inputs.
Yeah, I could put all these common configurations/rules in DB and workers
can pick it up dynamically at run time.
In this case DB configuration/connection details need to be hard coded  ?
Is there any way worker can pickup  DB name/credentials etc at run time
dynamically ?

I am going through the feature/API documentation, but how about using
function closer  and setGlobalJobParameters/getGlobalJobParameters ?

+Baswaraj

On Wed, Aug 24, 2016 at 5:17 PM, Maximilian Michels <mx...@apache.org> wrote:

> Hi!
>
> 1. The community is working on adding side inputs to the DataStream
> API. That will allow you to easily distribute data to all of your
> workers.
>
> 2. In the meantime, you could use `.broadcast()` on a DataSet to
> broadcast data to all workers. You still have to join that data with
> another stream though.
>
> 3. The easiest method of all is to simply load your file in the
> RichMapFunction's open() method. The file can reside in a distributed
> file system which is accessible by all workers.
>
> Cheers,
> Max
>
> On Wed, Aug 24, 2016 at 6:45 AM, Jark Wu <wu...@alibaba-inc.com>
> wrote:
> > Hi,
> >
> > I think what Bswaraj want is excatly something like Storm Distributed
> Cache
> > API[1] (if I’m not misunderstanding).
> >
> > The distributed cache feature in storm is used to efficiently distribute
> > files (or blobs, which is the equivalent terminology for a file in the
> > distributed cache and is used interchangeably in this document) that are
> > large and can change during the lifetime of a topology, such as
> geo-location
> > data, dictionaries, etc. Typical use cases include phrase recognition,
> > entity extraction, document classification, URL re-writing,
> location/address
> > detection and so forth. Such files may be several KB to several GB in
> size.
> > For small datasets that don't need dynamic updates, including them in the
> > topology jar could be fine. But for large files, the startup times could
> > become very large. In these cases, the distributed cache feature can
> provide
> > fast topology startup, especially if the files were previously downloaded
> > for the same submitter and are still in the cache. This is useful with
> > frequent deployments, sometimes few times a day with updated jars,
> because
> > the large cached files will remain available without changes. The large
> > cached blobs that do not change frequently will remain available in the
> > distributed cache.
> >
> >
> > We can look into this whether it is a common use case and how to
> implement
> > it in Flink.
> >
> > [1] http://storm.apache.org/releases/2.0.0-SNAPSHOT/
> distcache-blobstore.html
> >
> >
> > - Jark Wu
> >
> > 在 2016年8月23日,下午9:45,Lohith Samaga M <Lo...@mphasis.com> 写道:
> >
> > Hi
> > May be you could use Cassandra to store and fetch all such reference
> data.
> > This way the reference data can be updated without restarting your
> > application.
> >
> > Lohith
> >
> > Sent from my Sony Xperia™ smartphone
> >
> >
> >
> > ---- Baswaraj Kasture wrote ----
> >
> > Thanks Kostas !
> > I am using DataStream API.
> >
> > I have few config/property files (key vale text file) and also have
> business
> > rule files (json).
> > These rules and configurations are needed when we process incoming event.
> > Is there any way to share them to task nodes from driver program ?
> > I think this is very common use case and am sure other users may face
> > similar issues.
> >
> > +Baswaraj
> >
> > On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas
> > <k....@data-artisans.com> wrote:
> >>
> >> Hello Baswaraj,
> >>
> >> Are you using the DataSet (batch) or the DataStream API?
> >>
> >> If you are in the first, you can use a broadcast variable for your task.
> >> If you are using the DataStream one, then there is no proper support for
> >> that.
> >>
> >> Thanks,
> >> Kostas
> >>
> >> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kb...@gmail.com>
> >> wrote:
> >>
> >> Am running Flink standalone cluster.
> >>
> >> I have text file that need to be shared across tasks when i submit my
> >> application.
> >> in other words , put this text file in class path of running tasks.
> >>
> >> How can we achieve this with flink ?
> >>
> >> In spark, spark-submit has --jars option that puts all the files
> specified
> >> in class path of executors (executors run in separate JVM and spawned
> >> dynamically, so it is possible).
> >>
> >> Flink's task managers run tasks in separate thread under taskmanager JVM
> >> (?) , how can we make this text file to be accessible on all tasks
> spawned
> >> by current application ?
> >>
> >> Using HDFS, NFS or including file in program jar is one way that i know,
> >> but am looking for solution that can allows me to provide text file at
> run
> >> time and still accessible in all tasks.
> >> Thanks.
> >>
> >>
> >
> >
> > Information transmitted by this e-mail is proprietary to Mphasis, its
> > associated companies and/ or its customers and is intended
> > for use only by the individual or entity to which it is addressed, and
> may
> > contain information that is privileged, confidential or
> > exempt from disclosure under applicable law. If you are not the intended
> > recipient or it appears that this mail has been forwarded
> > to you without proper authority, you are notified that any use or
> > dissemination of this information in any manner is strictly
> > prohibited. In such cases, please notify us immediately at
> > mailmaster@mphasis.com and delete this mail from your records.
> >
> >
>

Re: How to share text file across tasks at run time in flink.

Posted by Maximilian Michels <mx...@apache.org>.
Hi!

1. The community is working on adding side inputs to the DataStream
API. That will allow you to easily distribute data to all of your
workers.

2. In the meantime, you could use `.broadcast()` on a DataSet to
broadcast data to all workers. You still have to join that data with
another stream though.

3. The easiest method of all is to simply load your file in the
RichMapFunction's open() method. The file can reside in a distributed
file system which is accessible by all workers.

Cheers,
Max

On Wed, Aug 24, 2016 at 6:45 AM, Jark Wu <wu...@alibaba-inc.com> wrote:
> Hi,
>
> I think what Bswaraj want is excatly something like Storm Distributed Cache
> API[1] (if I’m not misunderstanding).
>
> The distributed cache feature in storm is used to efficiently distribute
> files (or blobs, which is the equivalent terminology for a file in the
> distributed cache and is used interchangeably in this document) that are
> large and can change during the lifetime of a topology, such as geo-location
> data, dictionaries, etc. Typical use cases include phrase recognition,
> entity extraction, document classification, URL re-writing, location/address
> detection and so forth. Such files may be several KB to several GB in size.
> For small datasets that don't need dynamic updates, including them in the
> topology jar could be fine. But for large files, the startup times could
> become very large. In these cases, the distributed cache feature can provide
> fast topology startup, especially if the files were previously downloaded
> for the same submitter and are still in the cache. This is useful with
> frequent deployments, sometimes few times a day with updated jars, because
> the large cached files will remain available without changes. The large
> cached blobs that do not change frequently will remain available in the
> distributed cache.
>
>
> We can look into this whether it is a common use case and how to implement
> it in Flink.
>
> [1] http://storm.apache.org/releases/2.0.0-SNAPSHOT/distcache-blobstore.html
>
>
> - Jark Wu
>
> 在 2016年8月23日,下午9:45,Lohith Samaga M <Lo...@mphasis.com> 写道:
>
> Hi
> May be you could use Cassandra to store and fetch all such reference data.
> This way the reference data can be updated without restarting your
> application.
>
> Lohith
>
> Sent from my Sony Xperia™ smartphone
>
>
>
> ---- Baswaraj Kasture wrote ----
>
> Thanks Kostas !
> I am using DataStream API.
>
> I have few config/property files (key vale text file) and also have business
> rule files (json).
> These rules and configurations are needed when we process incoming event.
> Is there any way to share them to task nodes from driver program ?
> I think this is very common use case and am sure other users may face
> similar issues.
>
> +Baswaraj
>
> On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas
> <k....@data-artisans.com> wrote:
>>
>> Hello Baswaraj,
>>
>> Are you using the DataSet (batch) or the DataStream API?
>>
>> If you are in the first, you can use a broadcast variable for your task.
>> If you are using the DataStream one, then there is no proper support for
>> that.
>>
>> Thanks,
>> Kostas
>>
>> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kb...@gmail.com>
>> wrote:
>>
>> Am running Flink standalone cluster.
>>
>> I have text file that need to be shared across tasks when i submit my
>> application.
>> in other words , put this text file in class path of running tasks.
>>
>> How can we achieve this with flink ?
>>
>> In spark, spark-submit has --jars option that puts all the files specified
>> in class path of executors (executors run in separate JVM and spawned
>> dynamically, so it is possible).
>>
>> Flink's task managers run tasks in separate thread under taskmanager JVM
>> (?) , how can we make this text file to be accessible on all tasks spawned
>> by current application ?
>>
>> Using HDFS, NFS or including file in program jar is one way that i know,
>> but am looking for solution that can allows me to provide text file at run
>> time and still accessible in all tasks.
>> Thanks.
>>
>>
>
>
> Information transmitted by this e-mail is proprietary to Mphasis, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and may
> contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the intended
> recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmaster@mphasis.com and delete this mail from your records.
>
>

Re: How to share text file across tasks at run time in flink.

Posted by Jark Wu <wu...@alibaba-inc.com>.
Hi,

I think what Bswaraj want is excatly something like Storm Distributed Cache API[1] (if I’m not misunderstanding). 

> The distributed cache feature in storm is used to efficiently distribute files (or blobs, which is the equivalent terminology for a file in the distributed cache and is used interchangeably in this document) that are large and can change during the lifetime of a topology, such as geo-location data, dictionaries, etc. Typical use cases include phrase recognition, entity extraction, document classification, URL re-writing, location/address detection and so forth. Such files may be several KB to several GB in size. For small datasets that don't need dynamic updates, including them in the topology jar could be fine. But for large files, the startup times could become very large. In these cases, the distributed cache feature can provide fast topology startup, especially if the files were previously downloaded for the same submitter and are still in the cache. This is useful with frequent deployments, sometimes few times a day with updated jars, because the large cached files will remain available without changes. The large cached blobs that do not change frequently will remain available in the distributed cache.

We can look into this whether it is a common use case and how to implement it in Flink. 

[1] http://storm.apache.org/releases/2.0.0-SNAPSHOT/distcache-blobstore.html <http://storm.apache.org/releases/2.0.0-SNAPSHOT/distcache-blobstore.html>
 

- Jark Wu 

> 在 2016年8月23日,下午9:45,Lohith Samaga M <Lo...@mphasis.com> 写道:
> 
> Hi
> May be you could use Cassandra to store and fetch all such reference data.  This way the reference data can be updated without restarting your application.
> 
> Lohith
> 
> Sent from my Sony Xperia™ smartphone
> 
> 
> 
> ---- Baswaraj Kasture wrote ----
> 
> Thanks Kostas !
> I am using DataStream API.
> 
> I have few config/property files (key vale text file) and also have business rule files (json).
> These rules and configurations are needed when we process incoming event.
> Is there any way to share them to task nodes from driver program ?
> I think this is very common use case and am sure other users may face similar issues.
> 
> +Baswaraj
> 
> On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas <k.kloudas@data-artisans.com <ma...@data-artisans.com>> wrote:
> Hello Baswaraj,
> 
> Are you using the DataSet (batch) or the DataStream API?
> 
> If you are in the first, you can use a broadcast variable <https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#broadcast-variables> for your task.
> If you are using the DataStream one, then there is no proper support for that.
> 
> Thanks,
> Kostas
> 
>> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kbaswaraj1@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Am running Flink standalone cluster.
>> 
>> I have text file that need to be shared across tasks when i submit my application.
>> in other words , put this text file in class path of running tasks.
>> 
>> How can we achieve this with flink ?
>> 
>> In spark, spark-submit has --jars option that puts all the files specified in class path of executors (executors run in separate JVM and spawned dynamically, so it is possible).
>> 
>> Flink's task managers run tasks in separate thread under taskmanager JVM (?) , how can we make this text file to be accessible on all tasks spawned by current application ?
>> 
>> Using HDFS, NFS or including file in program jar is one way that i know, but am looking for solution that can allows me to provide text file at run time and still accessible in all tasks.
>> Thanks.
> 
> 
> 
> Information transmitted by this e-mail is proprietary to Mphasis, its associated companies and/ or its customers and is intended 
> for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded 
> to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly 
> prohibited. In such cases, please notify us immediately at mailmaster@mphasis.com and delete this mail from your records.
> 


Re: How to share text file across tasks at run time in flink.

Posted by Lohith Samaga M <Lo...@mphasis.com>.
Hi
May be you could use Cassandra to store and fetch all such reference data.  This way the reference data can be updated without restarting your application.

Lohith

Sent from my Sony Xperia™ smartphone


---- Baswaraj Kasture wrote ----

Thanks Kostas !
I am using DataStream API.

I have few config/property files (key vale text file) and also have business rule files (json).
These rules and configurations are needed when we process incoming event.
Is there any way to share them to task nodes from driver program ?
I think this is very common use case and am sure other users may face similar issues.

+Baswaraj

On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas <k....@data-artisans.com>> wrote:
Hello Baswaraj,

Are you using the DataSet (batch) or the DataStream API?

If you are in the first, you can use a broadcast variable<https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#broadcast-variables> for your task.
If you are using the DataStream one, then there is no proper support for that.

Thanks,
Kostas

On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kb...@gmail.com>> wrote:

Am running Flink standalone cluster.

I have text file that need to be shared across tasks when i submit my application.
in other words , put this text file in class path of running tasks.

How can we achieve this with flink ?

In spark, spark-submit has --jars option that puts all the files specified in class path of executors (executors run in separate JVM and spawned dynamically, so it is possible).

Flink's task managers run tasks in separate thread under taskmanager JVM (?) , how can we make this text file to be accessible on all tasks spawned by current application ?

Using HDFS, NFS or including file in program jar is one way that i know, but am looking for solution that can allows me to provide text file at run time and still accessible in all tasks.
Thanks.


Information transmitted by this e-mail is proprietary to Mphasis, its associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at mailmaster@mphasis.com and delete this mail from your records.

Re: How to share text file across tasks at run time in flink.

Posted by Baswaraj Kasture <kb...@gmail.com>.
Thanks Kostas !
I am using DataStream API.

I have few config/property files (key vale text file) and also have
business rule files (json).
These rules and configurations are needed when we process incoming event.
Is there any way to share them to task nodes from driver program ?
I think this is very common use case and am sure other users may face
similar issues.

+Baswaraj

On Mon, Aug 22, 2016 at 4:56 PM, Kostas Kloudas <k.kloudas@data-artisans.com
> wrote:

> Hello Baswaraj,
>
> Are you using the DataSet (batch) or the DataStream API?
>
> If you are in the first, you can use a broadcast variable
> <https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#broadcast-variables> for
> your task.
> If you are using the DataStream one, then there is no proper support for
> that.
>
> Thanks,
> Kostas
>
> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kb...@gmail.com>
> wrote:
>
> Am running Flink standalone cluster.
>
> I have text file that need to be shared across tasks when i submit my
> application.
> in other words , put this text file in class path of running tasks.
>
> How can we achieve this with flink ?
>
> In spark, spark-submit has --jars option that puts all the files specified
> in class path of executors (executors run in separate JVM and spawned
> dynamically, so it is possible).
>
> Flink's task managers run tasks in separate thread under taskmanager JVM
> (?) , how can we make this text file to be accessible on all tasks spawned
> by current application ?
>
> Using HDFS, NFS or including file in program jar is one way that i know,
> but am looking for solution that can allows me to provide text file at run
> time and still accessible in all tasks.
> Thanks.
>
>
>

Re: How to share text file across tasks at run time in flink.

Posted by Kostas Kloudas <k....@data-artisans.com>.
Hello Baswaraj,

Are you using the DataSet (batch) or the DataStream API?

If you are in the first, you can use a broadcast variable <https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#broadcast-variables> for your task.
If you are using the DataStream one, then there is no proper support for that.

Thanks,
Kostas

> On Aug 20, 2016, at 12:33 PM, Baswaraj Kasture <kb...@gmail.com> wrote:
> 
> Am running Flink standalone cluster.
> 
> I have text file that need to be shared across tasks when i submit my application.
> in other words , put this text file in class path of running tasks.
> 
> How can we achieve this with flink ?
> 
> In spark, spark-submit has --jars option that puts all the files specified in class path of executors (executors run in separate JVM and spawned dynamically, so it is possible).
> 
> Flink's task managers run tasks in separate thread under taskmanager JVM (?) , how can we make this text file to be accessible on all tasks spawned by current application ?
> 
> Using HDFS, NFS or including file in program jar is one way that i know, but am looking for solution that can allows me to provide text file at run time and still accessible in all tasks.
> Thanks.