You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Marco Villalobos <mv...@kineteque.com> on 2020/06/16 04:46:17 UTC

Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).

If there is a way, I'd appreciate an example.

Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Posted by Marco Villalobos <mv...@kineteque.com>.
Okay, it is not supported.  

I thought about this more and I disagree that this would break "distributability".

Currently, the API accepts a String which is a path, whether it be a path to a remote URL or a local file.
However, after the URL is parsed, ultimately what ends up happening is that an InputStream will serve as the abstraction that reads input from some source.

An InputStream can be remote, it can be a local file, it can be a connection to a server, or another client, and that situation, the system remains distributed.

Also, such an enhancement promotes "Interoperability" because now the user can decide the source of that data, rather forcing it to be a URL or physical file path.

I think this feature would make testing and demos more portable. I was writing a demo, and I wanted it to run without command-line arguments, which would have been very handy. I want the user to simply checkout the code and run it without having to supply a command line parameter declaring where the input file resides.

Thank you.

> On Jun 16, 2020, at 4:57 AM, Aljoscha Krettek <al...@apache.org> wrote:
> 
> Hi Marco,
> 
> this is not possible since Flink is designed mostly to read files from a distributed filesystem, where paths are used to refer to those files. If you read from files on the classpath you could just use plain old Java code and won't need a distributed processing system such as Flink.
> 
> Best,
> Aljoscha
> 
> On 16.06.20 06:46, Marco Villalobos wrote:
>> Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?
>> I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).
>> If there is a way, I'd appreciate an example.
> 


Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Posted by Marco Villalobos <mv...@kineteque.com>.
While I still think it would be great for Flink to accept an InputStream, and allow the programmer to decide if it is a remote TCP call or local file, for the sake of my demo, I simply 
found the file path within Gradle and supplied to the Gradle application run plugin like this:

run {
    args = ["--input-file", file('timeseries.csv')]
}

and that launched my application with minimal configuration.

> On Jun 17, 2020, at 7:11 AM, Aljoscha Krettek <al...@apache.org> wrote:
> 
> Hi,
> 
> for simple demos you can also use env.fromElements() or env.fromCollection() to create a source from some data that you have already available.
> 
> Does that help?
> 
> Best,
> Aljoscha
> 
> On 16.06.20 15:35, Marco Villalobos wrote:
>> Okay, it is not supported.
>> I understand such a feature is not needed in production systems, but it could make testing and demos more portable. I was writing a demo, and I wanted it to run without command-line arguments, which would have been very handy. I want the user to simply checkout the code and run it without having to supply a command line parameter declaring where the input file resides.
>> Thank you.
>>> On Jun 16, 2020, at 4:57 AM, Aljoscha Krettek <al...@apache.org> wrote:
>>> 
>>> Hi Marco,
>>> 
>>> this is not possible since Flink is designed mostly to read files from a distributed filesystem, where paths are used to refer to those files. If you read from files on the classpath you could just use plain old Java code and won't need a distributed processing system such as Flink.
>>> 
>>> Best,
>>> Aljoscha
>>> 
>>> On 16.06.20 06:46, Marco Villalobos wrote:
>>>> Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?
>>>> I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).
>>>> If there is a way, I'd appreciate an example.
>>> 
> 


Re: Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?

Posted by Aljoscha Krettek <al...@apache.org>.
Hi Marco,

this is not possible since Flink is designed mostly to read files from a 
distributed filesystem, where paths are used to refer to those files. If 
you read from files on the classpath you could just use plain old Java 
code and won't need a distributed processing system such as Flink.

Best,
Aljoscha

On 16.06.20 06:46, Marco Villalobos wrote:
> 
> Does Flink support reading files or CSV files from java.io.InputStream instead of file paths?
> 
> I'd rather just store my file on the class path and load it with java.lang.ClassLoader#getResourceAsStream(String).
> 
> If there is a way, I'd appreciate an example.
>