You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2014/10/29 20:31:00 UTC

WriteAsText bug or bad name?

Hi to all,
running the example at
http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
I was thinking that the writeAsText on a local file was creating a text
file on my local filesystem..instead it creates something similar to a
sequence file (within a folder).
This is something misleading I think...or the API name is wrong or this is
a bug (IMHO).
Btw..how can I modify the following program to write results in a single
text file on my local filesystem?

public static void main(String[] args) throws Exception {
 ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
 DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
 data.filter(new FilterFunction<String>() {
   public boolean filter(String value) {
    return value.startsWith("http://");
   }
  }).writeAsText("file:///tmp/res.txt");
  env.execute();}

Best,
Flavio

Re: WriteAsText bug or bad name?

Posted by Robert Waury <ro...@googlemail.com>.
Just use setParallelism(). This specifies how many threads are used for the
operator.

writeAsText("file:///tmp/res.txt").setParallelism(1);

This will give you a single output file.

Cheers,
Robert

On Wed, Oct 29, 2014 at 10:22 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Would it be that difficult to change the behaviour for file:/// and create
> a single file?or is there a way to do that?
> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
> wrote:
>
>> Dear Flavio,
>>
>> Yes, the writeAsText() merthod really creates a folder which contains a
>> file for each execution thread, so your threads do not block each other and
>> the execution can use multiple cores on your machine. You can see similar
>> results if you try it with env.execute() from an IDE.
>>
>> There are filesystems, HDFS to mention the most prominent one which can
>> transparently treat such folder structure as a single file and then it
>> would behave as you expect. I hope this answers your question.
>>
>> Best,
>>
>> Marton
>>
>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Hi to all,
>>> running the example at
>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>> I was thinking that the writeAsText on a local file was creating a text
>>> file on my local filesystem..instead it creates something similar to a
>>> sequence file (within a folder).
>>> This is something misleading I think...or the API name is wrong or this
>>> is a bug (IMHO).
>>> Btw..how can I modify the following program to write results in a single
>>> text file on my local filesystem?
>>>
>>> public static void main(String[] args) throws Exception {
>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>  data.filter(new FilterFunction<String>() {
>>>    public boolean filter(String value) {
>>>     return value.startsWith("http://");
>>>    }
>>>   }).writeAsText("file:///tmp/res.txt");
>>>   env.execute();}
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>>
>>

Re: WriteAsText bug or bad name?

Posted by Flavio Pompermaier <po...@okkam.it>.
That is not a big problem, it should just be well documented :)

On Mon, Nov 3, 2014 at 12:09 PM, Stephan Ewen <se...@apache.org> wrote:

> Hey!
>
> Parallel outputs require multiple output files.
>
> The only way to make this a single file by default is to set the default
> parallelism of file outputs to 1. That would cause many surprises on
> cluster execution, actually.
>
> It may be a fair compromise to set the default parallelism of sinks to 1
> if the execution environment is the local environment.
>
> Stephan
>
>
> On Mon, Nov 3, 2014 at 12:06 PM, Fabian Hueske <fh...@apache.org> wrote:
>
>> OK, I assume the problem of creating multiple files (+ output directory)
>> is fixed by setting the DOP of the OutputFormat to 1, right?
>>
>> But you still get binary output with a TextOutputFormat that writes a
>> DataSet<String>?
>>
>> 2014-11-03 11:58 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>>
>>> Nope. This is actually a bug for me, I don't know what the FLINK
>>> community or committee think
>>>
>>>
>>> On Mon, Nov 3, 2014 at 11:52 AM, Fabian Hueske <fh...@apache.org>
>>> wrote:
>>>
>>>> Hi Flavio,
>>>>
>>>> any updates on this bug?
>>>>
>>>> Thanks, Fabian
>>>>
>>>> 2014-10-29 22:36 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>>>>
>>>>> Regarding the text vs. sequence output.
>>>>> writeAsText() emits each record using its toString() method, which
>>>>> should be the String itself in your case.
>>>>>
>>>>> So if it would write binary data, something is wrong...
>>>>>
>>>>>
>>>>> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>>>>>
>>>>>> You can set the DOP of the data sink to 1 [1].
>>>>>> There is also a config parameter whether to create a directory or not
>>>>>> in case of DOP=1. If I remember correctly, the default is to NOT create
>>>>>> a folder for DOP=1.
>>>>>>
>>>>>> [1]
>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>>>>>
>>>>>> Best, Fabian
>>>>>>
>>>>>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>>>>>>
>>>>>>> Would it be that difficult to change the behaviour for file:/// and
>>>>>>> create a single file?or is there a way to do that?
>>>>>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear Flavio,
>>>>>>>>
>>>>>>>> Yes, the writeAsText() merthod really creates a folder which
>>>>>>>> contains a file for each execution thread, so your threads do not block
>>>>>>>> each other and the execution can use multiple cores on your machine. You
>>>>>>>> can see similar results if you try it with env.execute() from an IDE.
>>>>>>>>
>>>>>>>> There are filesystems, HDFS to mention the most prominent one which
>>>>>>>> can transparently treat such folder structure as a single file and then it
>>>>>>>> would behave as you expect. I hope this answers your question.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Marton
>>>>>>>>
>>>>>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>
>>>>>>>>> Hi to all,
>>>>>>>>> running the example at
>>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>>>>>> I was thinking that the writeAsText on a local file was creating a text
>>>>>>>>> file on my local filesystem..instead it creates something similar to a
>>>>>>>>> sequence file (within a folder).
>>>>>>>>> This is something misleading I think...or the API name is wrong or
>>>>>>>>> this is a bug (IMHO).
>>>>>>>>> Btw..how can I modify the following program to write results in a
>>>>>>>>> single text file on my local filesystem?
>>>>>>>>>
>>>>>>>>> public static void main(String[] args) throws Exception {
>>>>>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>>>>>  data.filter(new FilterFunction<String>() {
>>>>>>>>>    public boolean filter(String value) {
>>>>>>>>>     return value.startsWith("http://");
>>>>>>>>>    }
>>>>>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>>>>>   env.execute();}
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Flavio
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: WriteAsText bug or bad name?

Posted by Stephan Ewen <se...@apache.org>.
Hey!

Parallel outputs require multiple output files.

The only way to make this a single file by default is to set the default
parallelism of file outputs to 1. That would cause many surprises on
cluster execution, actually.

It may be a fair compromise to set the default parallelism of sinks to 1 if
the execution environment is the local environment.

Stephan


On Mon, Nov 3, 2014 at 12:06 PM, Fabian Hueske <fh...@apache.org> wrote:

> OK, I assume the problem of creating multiple files (+ output directory)
> is fixed by setting the DOP of the OutputFormat to 1, right?
>
> But you still get binary output with a TextOutputFormat that writes a
> DataSet<String>?
>
> 2014-11-03 11:58 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>
>> Nope. This is actually a bug for me, I don't know what the FLINK
>> community or committee think
>>
>>
>> On Mon, Nov 3, 2014 at 11:52 AM, Fabian Hueske <fh...@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> any updates on this bug?
>>>
>>> Thanks, Fabian
>>>
>>> 2014-10-29 22:36 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>>>
>>>> Regarding the text vs. sequence output.
>>>> writeAsText() emits each record using its toString() method, which
>>>> should be the String itself in your case.
>>>>
>>>> So if it would write binary data, something is wrong...
>>>>
>>>>
>>>> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>>>>
>>>>> You can set the DOP of the data sink to 1 [1].
>>>>> There is also a config parameter whether to create a directory or not
>>>>> in case of DOP=1. If I remember correctly, the default is to NOT create
>>>>> a folder for DOP=1.
>>>>>
>>>>> [1]
>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>>>>>
>>>>>> Would it be that difficult to change the behaviour for file:/// and
>>>>>> create a single file?or is there a way to do that?
>>>>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear Flavio,
>>>>>>>
>>>>>>> Yes, the writeAsText() merthod really creates a folder which
>>>>>>> contains a file for each execution thread, so your threads do not block
>>>>>>> each other and the execution can use multiple cores on your machine. You
>>>>>>> can see similar results if you try it with env.execute() from an IDE.
>>>>>>>
>>>>>>> There are filesystems, HDFS to mention the most prominent one which
>>>>>>> can transparently treat such folder structure as a single file and then it
>>>>>>> would behave as you expect. I hope this answers your question.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Marton
>>>>>>>
>>>>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>
>>>>>>>> Hi to all,
>>>>>>>> running the example at
>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>>>>> I was thinking that the writeAsText on a local file was creating a text
>>>>>>>> file on my local filesystem..instead it creates something similar to a
>>>>>>>> sequence file (within a folder).
>>>>>>>> This is something misleading I think...or the API name is wrong or
>>>>>>>> this is a bug (IMHO).
>>>>>>>> Btw..how can I modify the following program to write results in a
>>>>>>>> single text file on my local filesystem?
>>>>>>>>
>>>>>>>> public static void main(String[] args) throws Exception {
>>>>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>>>>  data.filter(new FilterFunction<String>() {
>>>>>>>>    public boolean filter(String value) {
>>>>>>>>     return value.startsWith("http://");
>>>>>>>>    }
>>>>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>>>>   env.execute();}
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: WriteAsText bug or bad name?

Posted by Fabian Hueske <fh...@apache.org>.
OK, I assume the problem of creating multiple files (+ output directory) is
fixed by setting the DOP of the OutputFormat to 1, right?

But you still get binary output with a TextOutputFormat that writes a
DataSet<String>?

2014-11-03 11:58 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:

> Nope. This is actually a bug for me, I don't know what the FLINK community
> or committee think
>
>
> On Mon, Nov 3, 2014 at 11:52 AM, Fabian Hueske <fh...@apache.org> wrote:
>
>> Hi Flavio,
>>
>> any updates on this bug?
>>
>> Thanks, Fabian
>>
>> 2014-10-29 22:36 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>>
>>> Regarding the text vs. sequence output.
>>> writeAsText() emits each record using its toString() method, which
>>> should be the String itself in your case.
>>>
>>> So if it would write binary data, something is wrong...
>>>
>>>
>>> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>>>
>>>> You can set the DOP of the data sink to 1 [1].
>>>> There is also a config parameter whether to create a directory or not
>>>> in case of DOP=1. If I remember correctly, the default is to NOT create
>>>> a folder for DOP=1.
>>>>
>>>> [1]
>>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>>>
>>>> Best, Fabian
>>>>
>>>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>>>>
>>>>> Would it be that difficult to change the behaviour for file:/// and
>>>>> create a single file?or is there a way to do that?
>>>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear Flavio,
>>>>>>
>>>>>> Yes, the writeAsText() merthod really creates a folder which contains
>>>>>> a file for each execution thread, so your threads do not block each other
>>>>>> and the execution can use multiple cores on your machine. You can see
>>>>>> similar results if you try it with env.execute() from an IDE.
>>>>>>
>>>>>> There are filesystems, HDFS to mention the most prominent one which
>>>>>> can transparently treat such folder structure as a single file and then it
>>>>>> would behave as you expect. I hope this answers your question.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Marton
>>>>>>
>>>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>>>> pompermaier@okkam.it> wrote:
>>>>>>
>>>>>>> Hi to all,
>>>>>>> running the example at
>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>>>> I was thinking that the writeAsText on a local file was creating a text
>>>>>>> file on my local filesystem..instead it creates something similar to a
>>>>>>> sequence file (within a folder).
>>>>>>> This is something misleading I think...or the API name is wrong or
>>>>>>> this is a bug (IMHO).
>>>>>>> Btw..how can I modify the following program to write results in a
>>>>>>> single text file on my local filesystem?
>>>>>>>
>>>>>>> public static void main(String[] args) throws Exception {
>>>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>>>  data.filter(new FilterFunction<String>() {
>>>>>>>    public boolean filter(String value) {
>>>>>>>     return value.startsWith("http://");
>>>>>>>    }
>>>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>>>   env.execute();}
>>>>>>>
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: WriteAsText bug or bad name?

Posted by Flavio Pompermaier <po...@okkam.it>.
Nope. This is actually a bug for me, I don't know what the FLINK community
or committee think

On Mon, Nov 3, 2014 at 11:52 AM, Fabian Hueske <fh...@apache.org> wrote:

> Hi Flavio,
>
> any updates on this bug?
>
> Thanks, Fabian
>
> 2014-10-29 22:36 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>
>> Regarding the text vs. sequence output.
>> writeAsText() emits each record using its toString() method, which should
>> be the String itself in your case.
>>
>> So if it would write binary data, something is wrong...
>>
>>
>> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>>
>>> You can set the DOP of the data sink to 1 [1].
>>> There is also a config parameter whether to create a directory or not in
>>> case of DOP=1. If I remember correctly, the default is to NOT create
>>> a folder for DOP=1.
>>>
>>> [1]
>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>>
>>> Best, Fabian
>>>
>>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>>>
>>>> Would it be that difficult to change the behaviour for file:/// and
>>>> create a single file?or is there a way to do that?
>>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear Flavio,
>>>>>
>>>>> Yes, the writeAsText() merthod really creates a folder which contains
>>>>> a file for each execution thread, so your threads do not block each other
>>>>> and the execution can use multiple cores on your machine. You can see
>>>>> similar results if you try it with env.execute() from an IDE.
>>>>>
>>>>> There are filesystems, HDFS to mention the most prominent one which
>>>>> can transparently treat such folder structure as a single file and then it
>>>>> would behave as you expect. I hope this answers your question.
>>>>>
>>>>> Best,
>>>>>
>>>>> Marton
>>>>>
>>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>>> pompermaier@okkam.it> wrote:
>>>>>
>>>>>> Hi to all,
>>>>>> running the example at
>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>>> I was thinking that the writeAsText on a local file was creating a text
>>>>>> file on my local filesystem..instead it creates something similar to a
>>>>>> sequence file (within a folder).
>>>>>> This is something misleading I think...or the API name is wrong or
>>>>>> this is a bug (IMHO).
>>>>>> Btw..how can I modify the following program to write results in a
>>>>>> single text file on my local filesystem?
>>>>>>
>>>>>> public static void main(String[] args) throws Exception {
>>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>>  data.filter(new FilterFunction<String>() {
>>>>>>    public boolean filter(String value) {
>>>>>>     return value.startsWith("http://");
>>>>>>    }
>>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>>   env.execute();}
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: WriteAsText bug or bad name?

Posted by Fabian Hueske <fh...@apache.org>.
Hi Flavio,

any updates on this bug?

Thanks, Fabian

2014-10-29 22:36 GMT+01:00 Fabian Hueske <fh...@apache.org>:

> Regarding the text vs. sequence output.
> writeAsText() emits each record using its toString() method, which should
> be the String itself in your case.
>
> So if it would write binary data, something is wrong...
>
>
> 2014-10-29 22:34 GMT+01:00 Fabian Hueske <fh...@apache.org>:
>
>> You can set the DOP of the data sink to 1 [1].
>> There is also a config parameter whether to create a directory or not in
>> case of DOP=1. If I remember correctly, the default is to NOT create
>> a folder for DOP=1.
>>
>> [1]
>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>>
>> Best, Fabian
>>
>> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>>
>>> Would it be that difficult to change the behaviour for file:/// and
>>> create a single file?or is there a way to do that?
>>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
>>> wrote:
>>>
>>>> Dear Flavio,
>>>>
>>>> Yes, the writeAsText() merthod really creates a folder which contains a
>>>> file for each execution thread, so your threads do not block each other and
>>>> the execution can use multiple cores on your machine. You can see similar
>>>> results if you try it with env.execute() from an IDE.
>>>>
>>>> There are filesystems, HDFS to mention the most prominent one which can
>>>> transparently treat such folder structure as a single file and then it
>>>> would behave as you expect. I hope this answers your question.
>>>>
>>>> Best,
>>>>
>>>> Marton
>>>>
>>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>>> pompermaier@okkam.it> wrote:
>>>>
>>>>> Hi to all,
>>>>> running the example at
>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>>> I was thinking that the writeAsText on a local file was creating a text
>>>>> file on my local filesystem..instead it creates something similar to a
>>>>> sequence file (within a folder).
>>>>> This is something misleading I think...or the API name is wrong or
>>>>> this is a bug (IMHO).
>>>>> Btw..how can I modify the following program to write results in a
>>>>> single text file on my local filesystem?
>>>>>
>>>>> public static void main(String[] args) throws Exception {
>>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>>  data.filter(new FilterFunction<String>() {
>>>>>    public boolean filter(String value) {
>>>>>     return value.startsWith("http://");
>>>>>    }
>>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>>   env.execute();}
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Re: WriteAsText bug or bad name?

Posted by Fabian Hueske <fh...@apache.org>.
Regarding the text vs. sequence output.
writeAsText() emits each record using its toString() method, which should
be the String itself in your case.

So if it would write binary data, something is wrong...


2014-10-29 22:34 GMT+01:00 Fabian Hueske <fh...@apache.org>:

> You can set the DOP of the data sink to 1 [1].
> There is also a config parameter whether to create a directory or not in
> case of DOP=1. If I remember correctly, the default is to NOT create
> a folder for DOP=1.
>
> [1]
> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution
>
> Best, Fabian
>
> 2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>
>> Would it be that difficult to change the behaviour for file:/// and
>> create a single file?or is there a way to do that?
>> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
>> wrote:
>>
>>> Dear Flavio,
>>>
>>> Yes, the writeAsText() merthod really creates a folder which contains a
>>> file for each execution thread, so your threads do not block each other and
>>> the execution can use multiple cores on your machine. You can see similar
>>> results if you try it with env.execute() from an IDE.
>>>
>>> There are filesystems, HDFS to mention the most prominent one which can
>>> transparently treat such folder structure as a single file and then it
>>> would behave as you expect. I hope this answers your question.
>>>
>>> Best,
>>>
>>> Marton
>>>
>>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <
>>> pompermaier@okkam.it> wrote:
>>>
>>>> Hi to all,
>>>> running the example at
>>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>>> I was thinking that the writeAsText on a local file was creating a text
>>>> file on my local filesystem..instead it creates something similar to a
>>>> sequence file (within a folder).
>>>> This is something misleading I think...or the API name is wrong or this
>>>> is a bug (IMHO).
>>>> Btw..how can I modify the following program to write results in a
>>>> single text file on my local filesystem?
>>>>
>>>> public static void main(String[] args) throws Exception {
>>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>>  data.filter(new FilterFunction<String>() {
>>>>    public boolean filter(String value) {
>>>>     return value.startsWith("http://");
>>>>    }
>>>>   }).writeAsText("file:///tmp/res.txt");
>>>>   env.execute();}
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>>
>>>>
>>>
>

Re: WriteAsText bug or bad name?

Posted by Fabian Hueske <fh...@apache.org>.
You can set the DOP of the data sink to 1 [1].
There is also a config parameter whether to create a directory or not in
case of DOP=1. If I remember correctly, the default is to NOT create
a folder for DOP=1.

[1]
http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#parallel-execution

Best, Fabian

2014-10-29 22:22 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:

> Would it be that difficult to change the behaviour for file:/// and create
> a single file?or is there a way to do that?
> On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com>
> wrote:
>
>> Dear Flavio,
>>
>> Yes, the writeAsText() merthod really creates a folder which contains a
>> file for each execution thread, so your threads do not block each other and
>> the execution can use multiple cores on your machine. You can see similar
>> results if you try it with env.execute() from an IDE.
>>
>> There are filesystems, HDFS to mention the most prominent one which can
>> transparently treat such folder structure as a single file and then it
>> would behave as you expect. I hope this answers your question.
>>
>> Best,
>>
>> Marton
>>
>> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Hi to all,
>>> running the example at
>>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>>> I was thinking that the writeAsText on a local file was creating a text
>>> file on my local filesystem..instead it creates something similar to a
>>> sequence file (within a folder).
>>> This is something misleading I think...or the API name is wrong or this
>>> is a bug (IMHO).
>>> Btw..how can I modify the following program to write results in a single
>>> text file on my local filesystem?
>>>
>>> public static void main(String[] args) throws Exception {
>>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>>  data.filter(new FilterFunction<String>() {
>>>    public boolean filter(String value) {
>>>     return value.startsWith("http://");
>>>    }
>>>   }).writeAsText("file:///tmp/res.txt");
>>>   env.execute();}
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>>
>>

Re: WriteAsText bug or bad name?

Posted by Flavio Pompermaier <po...@okkam.it>.
Would it be that difficult to change the behaviour for file:/// and create
a single file?or is there a way to do that?
On Oct 29, 2014 9:52 PM, "Márton Balassi" <ba...@gmail.com> wrote:

> Dear Flavio,
>
> Yes, the writeAsText() merthod really creates a folder which contains a
> file for each execution thread, so your threads do not block each other and
> the execution can use multiple cores on your machine. You can see similar
> results if you try it with env.execute() from an IDE.
>
> There are filesystems, HDFS to mention the most prominent one which can
> transparently treat such folder structure as a single file and then it
> would behave as you expect. I hope this answers your question.
>
> Best,
>
> Marton
>
> On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Hi to all,
>> running the example at
>> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
>> I was thinking that the writeAsText on a local file was creating a text
>> file on my local filesystem..instead it creates something similar to a
>> sequence file (within a folder).
>> This is something misleading I think...or the API name is wrong or this
>> is a bug (IMHO).
>> Btw..how can I modify the following program to write results in a single
>> text file on my local filesystem?
>>
>> public static void main(String[] args) throws Exception {
>>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>>  data.filter(new FilterFunction<String>() {
>>    public boolean filter(String value) {
>>     return value.startsWith("http://");
>>    }
>>   }).writeAsText("file:///tmp/res.txt");
>>   env.execute();}
>>
>> Best,
>> Flavio
>>
>>
>>
>

Re: WriteAsText bug or bad name?

Posted by Márton Balassi <ba...@gmail.com>.
Dear Flavio,

Yes, the writeAsText() merthod really creates a folder which contains a
file for each execution thread, so your threads do not block each other and
the execution can use multiple cores on your machine. You can see similar
results if you try it with env.execute() from an IDE.

There are filesystems, HDFS to mention the most prominent one which can
transparently treat such folder structure as a single file and then it
would behave as you expect. I hope this answers your question.

Best,

Marton

On Wed, Oct 29, 2014 at 8:31 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi to all,
> running the example at
> http://flink.incubator.apache.org/docs/0.7-incubating/local_execution.html
> I was thinking that the writeAsText on a local file was creating a text
> file on my local filesystem..instead it creates something similar to a
> sequence file (within a folder).
> This is something misleading I think...or the API name is wrong or this is
> a bug (IMHO).
> Btw..how can I modify the following program to write results in a single
> text file on my local filesystem?
>
> public static void main(String[] args) throws Exception {
>  ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>  DataSet<String> data = env.readTextFile("file:///tmp/res.txt");
>  data.filter(new FilterFunction<String>() {
>    public boolean filter(String value) {
>     return value.startsWith("http://");
>    }
>   }).writeAsText("file:///tmp/res.txt");
>   env.execute();}
>
> Best,
> Flavio
>
>
>