You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Habib Mostafaei <ha...@inet.tu-berlin.de> on 2019/10/29 13:27:19 UTC
low performance in running queries
Hi all,
I am running Flink on a standalone cluster and getting very long
execution time for the streaming queries like WordCount for a fixed text
file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
have a text file with size of 2GB. When I run the Flink on a standalone
cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
it took around two hours to finish counting this file while a simple
python script can do it in around 7 minutes. Just wondering what is
wrong with my setup. I ran the experiments on a cluster with six
taskManagers, but I still get very long execution time like 25 minutes
or so. I tried to increase the JVM heap size to have lower execution
time but it did not help. I attached the log file and the Flink
configuration file to this email.
Best,
Habib
Re: low performance in running queries
Posted by Zhenghua Gao <do...@gmail.com>.
Hi,
I ran the streaming WordCount with a 2GB text file(copied
/usr/share/dict/words 400 times) last weekend and didn't reproduce your
result(16 minutes in my case).
But i find some clues may help you:
The streaming WordCount job would output all intermedia result in your
output file(if specified) or taskmanager.out.
It's large (about 4GB in my case) and causes the disk writes high.
*Best Regards,*
*Zhenghua Gao*
On Fri, Nov 1, 2019 at 4:40 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
wrote:
> I used streaming WordCount provided by Flink and the file contains text
> like "This is some text...". I just copied several times.
>
> Best,
>
> Habib
> On 11/1/2019 6:03 AM, Zhenghua Gao wrote:
>
> 2019-10-30 15:59:52,122 INFO org.apache.flink.runtime.taskmanager.Task - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from DEPLOYING to RUNNING.
>
> 2019-10-30 17:45:10,943 INFO org.apache.flink.runtime.taskmanager.Task - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from RUNNING to FINISHED.
>
> It's surprise that the source task uses 95 mins to read a 2G file.
>
> Could you give me your code snippets and some sample lines of the 2G file?
>
> I will try to reproduce your scenario and dig the root causes.
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Thu, Oct 31, 2019 at 9:05 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
> wrote:
>
>> I enclosed all logs from the run and for this run I used parallelism one.
>> However, for other runs I checked and found that all parallel workers were
>> working properly. Is there a simple way to get profiling information in
>> Flink?
>>
>> Best,
>>
>> Habib
>> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>
>> I think more runtime information would help figure out where the problem
>> is.
>> 1) how many parallelisms actually working
>> 2) the metrics for each operator
>> 3) the jvm profiling information, etc
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
>> wrote:
>>
>>> Thanks Gao for the reply. I used the parallelism parameter with
>>> different values like 6 and 8 but still the execution time is not
>>> comparable with a single threaded python script. What would be the
>>> reasonable value for the parallelism?
>>>
>>> Best,
>>>
>>> Habib
>>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>
>>> The reason might be the parallelism of your task is only 1, that's too
>>> low.
>>> See [1] to specify proper parallelism for your job, and the execution
>>> time should be reduced significantly.
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>>
>>> *Best Regards,*
>>> *Zhenghua Gao*
>>>
>>>
>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am running Flink on a standalone cluster and getting very long
>>>> execution time for the streaming queries like WordCount for a fixed
>>>> text
>>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>>> cluster, i.e., one JobManager and one taskManager with 25GB of
>>>> heapsize,
>>>> it took around two hours to finish counting this file while a simple
>>>> python script can do it in around 7 minutes. Just wondering what is
>>>> wrong with my setup. I ran the experiments on a cluster with six
>>>> taskManagers, but I still get very long execution time like 25 minutes
>>>> or so. I tried to increase the JVM heap size to have lower execution
>>>> time but it did not help. I attached the log file and the Flink
>>>> configuration file to this email.
>>>>
>>>> Best,
>>>>
>>>> Habib
>>>>
>>>>
>>
Re: low performance in running queries
Posted by Habib Mostafaei <ha...@inet.tu-berlin.de>.
I used streaming WordCount provided by Flink and the file contains text
like "This is some text...". I just copied several times.
Best,
Habib
On 11/1/2019 6:03 AM, Zhenghua Gao wrote:
> 2019-10-30 15:59:52,122 INFO org.apache.flink.runtime.taskmanager.Task - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from DEPLOYING to RUNNING.
> 2019-10-30 17:45:10,943 INFO org.apache.flink.runtime.taskmanager.Task - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from RUNNING to FINISHED.
> It's surprise that the source task uses 95 mins to read a 2G file.
> Could you give me your code snippets and some sample lines of the 2G file?
> I will try to reproduce your scenario and dig the root causes.
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Thu, Oct 31, 2019 at 9:05 PM Habib Mostafaei
> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>
> I enclosed all logs from the run and for this run I used
> parallelism one. However, for other runs I checked and found that
> all parallel workers were working properly. Is there a simple way
> to get profiling information in Flink?
>
> Best,
>
> Habib
>
> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>> I think more runtime information would help figure
>> outwheretheproblem is.
>> 1) how many parallelisms actually working
>> 2) the metrics for each operator
>> 3) the jvm profiling information, etc
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei
>> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>
>> Thanks Gao for the reply. I used the parallelism parameter
>> with different values like 6 and 8 but still the execution
>> time is not comparable with a single threaded python script.
>> What would be the reasonable value for the parallelism?
>>
>> Best,
>>
>> Habib
>>
>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>> The reason might be the parallelism of your task is only 1,
>>> that's too low.
>>> See [1] to specify proper parallelism for your job, and the
>>> execution time should be reduced significantly.
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>>
>>> *Best Regards,*
>>> *Zhenghua Gao*
>>>
>>>
>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
>>> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I am running Flink on a standalone cluster and getting
>>> very long
>>> execution time for the streaming queries like WordCount
>>> for a fixed text
>>> file. My VM runs on a Debian 10 with 16 cpu cores and
>>> 32GB of RAM. I
>>> have a text file with size of 2GB. When I run the Flink
>>> on a standalone
>>> cluster, i.e., one JobManager and one taskManager with
>>> 25GB of heapsize,
>>> it took around two hours to finish counting this file
>>> while a simple
>>> python script can do it in around 7 minutes. Just
>>> wondering what is
>>> wrong with my setup. I ran the experiments on a cluster
>>> with six
>>> taskManagers, but I still get very long execution time
>>> like 25 minutes
>>> or so. I tried to increase the JVM heap size to have
>>> lower execution
>>> time but it did not help. I attached the log file and
>>> the Flink
>>> configuration file to this email.
>>>
>>> Best,
>>>
>>> Habib
>>>
>
Re: low performance in running queries
Posted by Zhenghua Gao <do...@gmail.com>.
2019-10-30 15:59:52,122 INFO
org.apache.flink.runtime.taskmanager.Task - Split
Reader: Custom File Source -> Flat Map (1/1)
(6a17c410c3e36f524bb774d2dffed4a4) switched from DEPLOYING to RUNNING.
2019-10-30 17:45:10,943 INFO
org.apache.flink.runtime.taskmanager.Task - Split
Reader: Custom File Source -> Flat Map (1/1)
(6a17c410c3e36f524bb774d2dffed4a4) switched from RUNNING to FINISHED.
It's surprise that the source task uses 95 mins to read a 2G file.
Could you give me your code snippets and some sample lines of the 2G file?
I will try to reproduce your scenario and dig the root causes.
*Best Regards,*
*Zhenghua Gao*
On Thu, Oct 31, 2019 at 9:05 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
wrote:
> I enclosed all logs from the run and for this run I used parallelism one.
> However, for other runs I checked and found that all parallel workers were
> working properly. Is there a simple way to get profiling information in
> Flink?
>
> Best,
>
> Habib
> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>
> I think more runtime information would help figure out where the problem
> is.
> 1) how many parallelisms actually working
> 2) the metrics for each operator
> 3) the jvm profiling information, etc
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
> wrote:
>
>> Thanks Gao for the reply. I used the parallelism parameter with different
>> values like 6 and 8 but still the execution time is not comparable with a
>> single threaded python script. What would be the reasonable value for the
>> parallelism?
>>
>> Best,
>>
>> Habib
>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>
>> The reason might be the parallelism of your task is only 1, that's too
>> low.
>> See [1] to specify proper parallelism for your job, and the execution
>> time should be reduced significantly.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am running Flink on a standalone cluster and getting very long
>>> execution time for the streaming queries like WordCount for a fixed text
>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>>> it took around two hours to finish counting this file while a simple
>>> python script can do it in around 7 minutes. Just wondering what is
>>> wrong with my setup. I ran the experiments on a cluster with six
>>> taskManagers, but I still get very long execution time like 25 minutes
>>> or so. I tried to increase the JVM heap size to have lower execution
>>> time but it did not help. I attached the log file and the Flink
>>> configuration file to this email.
>>>
>>> Best,
>>>
>>> Habib
>>>
>>>
>
Re: low performance in running queries
Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,
Unfortunately your VisualVM snapshot doesn’t contain the profiler output. It should look like this [1].
> Checking the timeline of execution shows that the source operation is done in less than a second while Map and Reduce operations take long running time.
It could well be that the overhead comes for example from the state accesses, especially if you are using RocksDB. Still would be interesting to see the call stack that’s using the most CPU time.
Piotrek
[1] https://i.stack.imgur.com/yTdZ5.png
> On 4 Nov 2019, at 14:35, Habib Mostafaei <ha...@inet.tu-berlin.de> wrote:
>
> Hi,
>
> On 11/1/2019 4:40 PM, Piotr Nowojski wrote:
>> Hi,
>>
>> More important would be the code profiling output. I think VisualVM allows to share the code profiling result as “snapshots”? If you could analyse or share this, it would be helpful.
> Enclosed is a snapshot of VisualVM.
>>
>>
>> From the attached screenshot the only thing that is visible is that there are no GC issues, and secondly the application is running only on one (out of 10?) CPU cores. Which hints one obvious way how to improve the performance - scale out. However the WordCount example might not be the best for this, as I’m pretty sure its source is fundamentally not parallel.
> Yes, your are right that the source is not parallel. Checking the timeline of execution shows that the source operation is done in less than a second while Map and Reduce operations take long running time.
>
> Habib
>
>>
>> Piotrek
>>
>>> On 1 Nov 2019, at 15:57, Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>
>>> Hi Piotrek,
>>>
>>> Thanks for the list of profilers. I used VisualVM and here is the resource usage for taskManager.
>>>
>>> <imiafpejagonadce.png>
>>>
>>> Habib
>>>
>>>
>>>
>>> On 11/1/2019 9:48 AM, Piotr Nowojski wrote:
>>>> Hi,
>>>>
>>>> > Is there a simple way to get profiling information in Flink?
>>>>
>>>> Flink doesn’t provide any special tooling for that. Just use your chosen profiler, for example: Oracle’s Mission Control (free on non production clusters, no need to install anything if already using Oracle’s JVM), VisualVM (I think free), YourKit (paid). For each one of them there is a plenty of online support how to use them both for local and remote profiling.
>>>>
>>>> Piotrek
>>>>
>>>>> On 31 Oct 2019, at 14:05, Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>>>
>>>>> I enclosed all logs from the run and for this run I used parallelism one. However, for other runs I checked and found that all parallel workers were working properly. Is there a simple way to get profiling information in Flink?
>>>>>
>>>>> Best,
>>>>>
>>>>> Habib
>>>>>
>>>>> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>>>>> I think more runtime information would help figure out where the problem is.
>>>>>> 1) how many parallelisms actually working
>>>>>> 2) the metrics for each operator
>>>>>> 3) the jvm profiling information, etc
>>>>>>
>>>>>> Best Regards,
>>>>>> Zhenghua Gao
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>>>> Thanks Gao for the reply. I used the parallelism parameter with different values like 6 and 8 but still the execution time is not comparable with a single threaded python script. What would be the reasonable value for the parallelism?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Habib
>>>>>>
>>>>>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>>>>> The reason might be the parallelism of your task is only 1, that's too low.
>>>>>>> See [1] to specify proper parallelism for your job, and the execution time should be reduced significantly.
>>>>>>>
>>>>>>> [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html <https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Zhenghua Gao
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am running Flink on a standalone cluster and getting very long
>>>>>>> execution time for the streaming queries like WordCount for a fixed text
>>>>>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>>>>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>>>>>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>>>>>>> it took around two hours to finish counting this file while a simple
>>>>>>> python script can do it in around 7 minutes. Just wondering what is
>>>>>>> wrong with my setup. I ran the experiments on a cluster with six
>>>>>>> taskManagers, but I still get very long execution time like 25 minutes
>>>>>>> or so. I tried to increase the JVM heap size to have lower execution
>>>>>>> time but it did not help. I attached the log file and the Flink
>>>>>>> configuration file to this email.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Habib
>>>>>>>
>>>>>
>>>>> <flink-xxx-client-xxx.log><flink-xxx-standalonesession-0-xxx.log><flink-xxx-taskexecutor-0-xxx.log>
>>>>
>
> <application-1572869697842.apps>
Re: low performance in running queries
Posted by Habib Mostafaei <ha...@inet.tu-berlin.de>.
Hi,
On 11/1/2019 4:40 PM, Piotr Nowojski wrote:
> Hi,
>
> More important would be the code profiling output. I think VisualVM
> allows to share the code profiling result as “snapshots”? If you could
> analyse or share this, it would be helpful.
Enclosed is a snapshot of VisualVM.
>
> From the attached screenshot the only thing that is visible is that
> there are no GC issues, and secondly the application is running only
> on one (out of 10?) CPU cores. Which hints one obvious way how to
> improve the performance - scale out. However the WordCount example
> might not be the best for this, as I’m pretty sure its source is
> fundamentally not parallel.
Yes, your are right that the source is not parallel. Checking the
timeline of execution shows that the source operation is done in less
than a second while Map and Reduce operations take long running time.
Habib
>
> Piotrek
>
>> On 1 Nov 2019, at 15:57, Habib Mostafaei <habib@inet.tu-berlin.de
>> <ma...@inet.tu-berlin.de>> wrote:
>>
>> Hi Piotrek,
>>
>> Thanks for the list of profilers. I used VisualVM and here is the
>> resource usage for taskManager.
>>
>> <imiafpejagonadce.png>
>>
>> Habib
>>
>>
>> On 11/1/2019 9:48 AM, Piotr Nowojski wrote:
>>> Hi,
>>>
>>> > Is there a simple way to get profiling information in Flink?
>>>
>>> Flink doesn’t provide any special tooling for that. Just use your
>>> chosen profiler, for example: Oracle’s Mission Control (free on non
>>> production clusters, no need to install anything if already using
>>> Oracle’s JVM), VisualVM (I think free), YourKit (paid). For each one
>>> of them there is a plenty of online support how to use them both for
>>> local and remote profiling.
>>>
>>> Piotrek
>>>
>>>> On 31 Oct 2019, at 14:05, Habib Mostafaei <habib@inet.tu-berlin.de
>>>> <ma...@inet.tu-berlin.de>> wrote:
>>>>
>>>> I enclosed all logs from the run and for this run I used
>>>> parallelism one. However, for other runs I checked and found that
>>>> all parallel workers were working properly. Is there a simple way
>>>> to get profiling information in Flink?
>>>>
>>>> Best,
>>>>
>>>> Habib
>>>>
>>>> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>>>> I think more runtime information would help figure
>>>>> outwheretheproblem is.
>>>>> 1) how many parallelisms actually working
>>>>> 2) the metrics for each operator
>>>>> 3) the jvm profiling information, etc
>>>>>
>>>>> *Best Regards,*
>>>>> *Zhenghua Gao*
>>>>>
>>>>>
>>>>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei
>>>>> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>>>
>>>>> Thanks Gao for the reply. I used the parallelism parameter
>>>>> with different values like 6 and 8 but still the execution
>>>>> time is not comparable with a single threaded python script.
>>>>> What would be the reasonable value for the parallelism?
>>>>>
>>>>> Best,
>>>>>
>>>>> Habib
>>>>>
>>>>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>>>> The reason might be the parallelism of your task is only 1,
>>>>>> that's too low.
>>>>>> See [1] to specify proper parallelism for your job, and the
>>>>>> execution time should be reduced significantly.
>>>>>>
>>>>>> [1]
>>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>>>>>
>>>>>> *Best Regards,*
>>>>>> *Zhenghua Gao*
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
>>>>>> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am running Flink on a standalone cluster and getting
>>>>>> very long
>>>>>> execution time for the streaming queries like WordCount
>>>>>> for a fixed text
>>>>>> file. My VM runs on a Debian 10 with 16 cpu cores and
>>>>>> 32GB of RAM. I
>>>>>> have a text file with size of 2GB. When I run the Flink
>>>>>> on a standalone
>>>>>> cluster, i.e., one JobManager and one taskManager with
>>>>>> 25GB of heapsize,
>>>>>> it took around two hours to finish counting this file
>>>>>> while a simple
>>>>>> python script can do it in around 7 minutes. Just
>>>>>> wondering what is
>>>>>> wrong with my setup. I ran the experiments on a cluster
>>>>>> with six
>>>>>> taskManagers, but I still get very long execution time
>>>>>> like 25 minutes
>>>>>> or so. I tried to increase the JVM heap size to have
>>>>>> lower execution
>>>>>> time but it did not help. I attached the log file and the
>>>>>> Flink
>>>>>> configuration file to this email.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Habib
>>>>>>
>>>>
>>>> <flink-xxx-client-xxx.log><flink-xxx-standalonesession-0-xxx.log><flink-xxx-taskexecutor-0-xxx.log>
>>>
Re: low performance in running queries
Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,
More important would be the code profiling output. I think VisualVM allows to share the code profiling result as “snapshots”? If you could analyse or share this, it would be helpful.
From the attached screenshot the only thing that is visible is that there are no GC issues, and secondly the application is running only on one (out of 10?) CPU cores. Which hints one obvious way how to improve the performance - scale out. However the WordCount example might not be the best for this, as I’m pretty sure its source is fundamentally not parallel.
Piotrek
> On 1 Nov 2019, at 15:57, Habib Mostafaei <ha...@inet.tu-berlin.de> wrote:
>
> Hi Piotrek,
>
> Thanks for the list of profilers. I used VisualVM and here is the resource usage for taskManager.
>
> <imiafpejagonadce.png>
>
> Habib
>
>
>
> On 11/1/2019 9:48 AM, Piotr Nowojski wrote:
>> Hi,
>>
>> > Is there a simple way to get profiling information in Flink?
>>
>> Flink doesn’t provide any special tooling for that. Just use your chosen profiler, for example: Oracle’s Mission Control (free on non production clusters, no need to install anything if already using Oracle’s JVM), VisualVM (I think free), YourKit (paid). For each one of them there is a plenty of online support how to use them both for local and remote profiling.
>>
>> Piotrek
>>
>>> On 31 Oct 2019, at 14:05, Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>
>>> I enclosed all logs from the run and for this run I used parallelism one. However, for other runs I checked and found that all parallel workers were working properly. Is there a simple way to get profiling information in Flink?
>>>
>>> Best,
>>>
>>> Habib
>>>
>>> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>>> I think more runtime information would help figure out where the problem is.
>>>> 1) how many parallelisms actually working
>>>> 2) the metrics for each operator
>>>> 3) the jvm profiling information, etc
>>>>
>>>> Best Regards,
>>>> Zhenghua Gao
>>>>
>>>>
>>>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>> Thanks Gao for the reply. I used the parallelism parameter with different values like 6 and 8 but still the execution time is not comparable with a single threaded python script. What would be the reasonable value for the parallelism?
>>>>
>>>> Best,
>>>>
>>>> Habib
>>>>
>>>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>>> The reason might be the parallelism of your task is only 1, that's too low.
>>>>> See [1] to specify proper parallelism for your job, and the execution time should be reduced significantly.
>>>>>
>>>>> [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html <https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html>
>>>>>
>>>>> Best Regards,
>>>>> Zhenghua Gao
>>>>>
>>>>>
>>>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>>> Hi all,
>>>>>
>>>>> I am running Flink on a standalone cluster and getting very long
>>>>> execution time for the streaming queries like WordCount for a fixed text
>>>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>>>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>>>>> it took around two hours to finish counting this file while a simple
>>>>> python script can do it in around 7 minutes. Just wondering what is
>>>>> wrong with my setup. I ran the experiments on a cluster with six
>>>>> taskManagers, but I still get very long execution time like 25 minutes
>>>>> or so. I tried to increase the JVM heap size to have lower execution
>>>>> time but it did not help. I attached the log file and the Flink
>>>>> configuration file to this email.
>>>>>
>>>>> Best,
>>>>>
>>>>> Habib
>>>>>
>>>
>>> <flink-xxx-client-xxx.log><flink-xxx-standalonesession-0-xxx.log><flink-xxx-taskexecutor-0-xxx.log>
>>
> --
> Habib Mostafaei, Ph.D.
> Postdoctoral researcher
> TU Berlin,
> FG INET, MAR 4.003
> Marchstraße 23, 10587 Berlin
Re: low performance in running queries
Posted by Habib Mostafaei <ha...@inet.tu-berlin.de>.
Hi Piotrek,
Thanks for the list of profilers. I used VisualVM and here is the
resource usage for taskManager.
Habib
On 11/1/2019 9:48 AM, Piotr Nowojski wrote:
> Hi,
>
> > Is there a simple way to get profiling information in Flink?
>
> Flink doesn’t provide any special tooling for that. Just use your
> chosen profiler, for example: Oracle’s Mission Control (free on non
> production clusters, no need to install anything if already using
> Oracle’s JVM), VisualVM (I think free), YourKit (paid). For each one
> of them there is a plenty of online support how to use them both for
> local and remote profiling.
>
> Piotrek
>
>> On 31 Oct 2019, at 14:05, Habib Mostafaei <habib@inet.tu-berlin.de
>> <ma...@inet.tu-berlin.de>> wrote:
>>
>> I enclosed all logs from the run and for this run I used parallelism
>> one. However, for other runs I checked and found that all parallel
>> workers were working properly. Is there a simple way to get profiling
>> information in Flink?
>>
>> Best,
>>
>> Habib
>>
>> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>> I think more runtime information would help figure
>>> outwheretheproblem is.
>>> 1) how many parallelisms actually working
>>> 2) the metrics for each operator
>>> 3) the jvm profiling information, etc
>>>
>>> *Best Regards,*
>>> *Zhenghua Gao*
>>>
>>>
>>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei
>>> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>
>>> Thanks Gao for the reply. I used the parallelism parameter with
>>> different values like 6 and 8 but still the execution time is
>>> not comparable with a single threaded python script. What would
>>> be the reasonable value for the parallelism?
>>>
>>> Best,
>>>
>>> Habib
>>>
>>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>> The reason might be the parallelism of your task is only 1,
>>>> that's too low.
>>>> See [1] to specify proper parallelism for your job, and the
>>>> execution time should be reduced significantly.
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>>>
>>>> *Best Regards,*
>>>> *Zhenghua Gao*
>>>>
>>>>
>>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
>>>> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I am running Flink on a standalone cluster and getting very
>>>> long
>>>> execution time for the streaming queries like WordCount for
>>>> a fixed text
>>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB
>>>> of RAM. I
>>>> have a text file with size of 2GB. When I run the Flink on
>>>> a standalone
>>>> cluster, i.e., one JobManager and one taskManager with 25GB
>>>> of heapsize,
>>>> it took around two hours to finish counting this file while
>>>> a simple
>>>> python script can do it in around 7 minutes. Just wondering
>>>> what is
>>>> wrong with my setup. I ran the experiments on a cluster
>>>> with six
>>>> taskManagers, but I still get very long execution time like
>>>> 25 minutes
>>>> or so. I tried to increase the JVM heap size to have lower
>>>> execution
>>>> time but it did not help. I attached the log file and the
>>>> Flink
>>>> configuration file to this email.
>>>>
>>>> Best,
>>>>
>>>> Habib
>>>>
>>
>> <flink-xxx-client-xxx.log><flink-xxx-standalonesession-0-xxx.log><flink-xxx-taskexecutor-0-xxx.log>
>
--
Habib Mostafaei, Ph.D.
Postdoctoral researcher
TU Berlin,
FG INET, MAR 4.003
Marchstraße 23, 10587 Berlin
Re: low performance in running queries
Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,
> Is there a simple way to get profiling information in Flink?
Flink doesn’t provide any special tooling for that. Just use your chosen profiler, for example: Oracle’s Mission Control (free on non production clusters, no need to install anything if already using Oracle’s JVM), VisualVM (I think free), YourKit (paid). For each one of them there is a plenty of online support how to use them both for local and remote profiling.
Piotrek
> On 31 Oct 2019, at 14:05, Habib Mostafaei <ha...@inet.tu-berlin.de> wrote:
>
> I enclosed all logs from the run and for this run I used parallelism one. However, for other runs I checked and found that all parallel workers were working properly. Is there a simple way to get profiling information in Flink?
>
> Best,
>
> Habib
>
> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>> I think more runtime information would help figure out where the problem is.
>> 1) how many parallelisms actually working
>> 2) the metrics for each operator
>> 3) the jvm profiling information, etc
>>
>> Best Regards,
>> Zhenghua Gao
>>
>>
>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>> Thanks Gao for the reply. I used the parallelism parameter with different values like 6 and 8 but still the execution time is not comparable with a single threaded python script. What would be the reasonable value for the parallelism?
>>
>> Best,
>>
>> Habib
>>
>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>> The reason might be the parallelism of your task is only 1, that's too low.
>>> See [1] to specify proper parallelism for your job, and the execution time should be reduced significantly.
>>>
>>> [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html <https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html>
>>>
>>> Best Regards,
>>> Zhenghua Gao
>>>
>>>
>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>> Hi all,
>>>
>>> I am running Flink on a standalone cluster and getting very long
>>> execution time for the streaming queries like WordCount for a fixed text
>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>>> it took around two hours to finish counting this file while a simple
>>> python script can do it in around 7 minutes. Just wondering what is
>>> wrong with my setup. I ran the experiments on a cluster with six
>>> taskManagers, but I still get very long execution time like 25 minutes
>>> or so. I tried to increase the JVM heap size to have lower execution
>>> time but it did not help. I attached the log file and the Flink
>>> configuration file to this email.
>>>
>>> Best,
>>>
>>> Habib
>>>
>
> <flink-xxx-client-xxx.log><flink-xxx-standalonesession-0-xxx.log><flink-xxx-taskexecutor-0-xxx.log>
Re: low performance in running queries
Posted by Habib Mostafaei <ha...@inet.tu-berlin.de>.
I enclosed all logs from the run and for this run I used parallelism
one. However, for other runs I checked and found that all parallel
workers were working properly. Is there a simple way to get profiling
information in Flink?
Best,
Habib
On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
> I think more runtime information would help figure outwheretheproblem is.
> 1) how many parallelisms actually working
> 2) the metrics for each operator
> 3) the jvm profiling information, etc
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei
> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>
> Thanks Gao for the reply. I used the parallelism parameter with
> different values like 6 and 8 but still the execution time is not
> comparable with a single threaded python script. What would be the
> reasonable value for the parallelism?
>
> Best,
>
> Habib
>
> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>> The reason might be the parallelism of your task is only 1,
>> that's too low.
>> See [1] to specify proper parallelism for your job, and the
>> execution time should be reduced significantly.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
>> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>
>> Hi all,
>>
>> I am running Flink on a standalone cluster and getting very long
>> execution time for the streaming queries like WordCount for a
>> fixed text
>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of
>> RAM. I
>> have a text file with size of 2GB. When I run the Flink on a
>> standalone
>> cluster, i.e., one JobManager and one taskManager with 25GB
>> of heapsize,
>> it took around two hours to finish counting this file while a
>> simple
>> python script can do it in around 7 minutes. Just wondering
>> what is
>> wrong with my setup. I ran the experiments on a cluster with six
>> taskManagers, but I still get very long execution time like
>> 25 minutes
>> or so. I tried to increase the JVM heap size to have lower
>> execution
>> time but it did not help. I attached the log file and the Flink
>> configuration file to this email.
>>
>> Best,
>>
>> Habib
>>
Re: low performance in running queries
Posted by Zhenghua Gao <do...@gmail.com>.
I think more runtime information would help figure out where the problem is.
1) how many parallelisms actually working
2) the metrics for each operator
3) the jvm profiling information, etc
*Best Regards,*
*Zhenghua Gao*
On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
wrote:
> Thanks Gao for the reply. I used the parallelism parameter with different
> values like 6 and 8 but still the execution time is not comparable with a
> single threaded python script. What would be the reasonable value for the
> parallelism?
>
> Best,
>
> Habib
> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>
> The reason might be the parallelism of your task is only 1, that's too
> low.
> See [1] to specify proper parallelism for your job, and the execution
> time should be reduced significantly.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
> wrote:
>
>> Hi all,
>>
>> I am running Flink on a standalone cluster and getting very long
>> execution time for the streaming queries like WordCount for a fixed text
>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>> have a text file with size of 2GB. When I run the Flink on a standalone
>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>> it took around two hours to finish counting this file while a simple
>> python script can do it in around 7 minutes. Just wondering what is
>> wrong with my setup. I ran the experiments on a cluster with six
>> taskManagers, but I still get very long execution time like 25 minutes
>> or so. I tried to increase the JVM heap size to have lower execution
>> time but it did not help. I attached the log file and the Flink
>> configuration file to this email.
>>
>> Best,
>>
>> Habib
>>
>> --
> Habib Mostafaei, Ph.D.
> Postdoctoral researcher
> TU Berlin,
> FG INET, MAR 4.003
> Marchstraße 23, 10587 Berlin
>
>
Re: low performance in running queries
Posted by Piotr Nowojski <pi...@ververica.com>.
Hi,
I would also suggest to just attach a code profiler to the process during those 2 hours and gather some results. It might answer some questions what is taking so long time.
Piotrek
> On 30 Oct 2019, at 15:11, Chris Miller <ch...@gmail.com> wrote:
>
> I haven't run any benchmarks with Flink or even used it enough to directly help with your question, however I suspect that the following article might be relevant:
>
> http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/ <http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/>
>
> Given the computation you're performing is trivial, it's possible that the additional overhead of serialisation, interprocess communication, state management etc that distributed systems like Flink require are dominating the runtime here. 2 hours (or even 25 minutes) still seems too long to me however, so hopefully it really is just a configuration issue of some sort. Either way, if you do figure this out or anyone with good knowledge of the article above in relation to Flink is able to give their thoughts, I'd be very interested in hearing more.
>
> Regards,
> Chris
>
>
> ------ Original Message ------
> From: "Habib Mostafaei" <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>>
> To: "Zhenghua Gao" <docete@gmail.com <ma...@gmail.com>>
> Cc: "user" <user@flink.apache.org <ma...@flink.apache.org>>; "Georgios Smaragdakis" <georgios@inet.tu-berlin.de <ma...@inet.tu-berlin.de>>; "Niklas Semmler" <niklas@inet.tu-berlin.de <ma...@inet.tu-berlin.de>>
> Sent: 30/10/2019 12:25:28
> Subject: Re: low performance in running queries
>
>> Thanks Gao for the reply. I used the parallelism parameter with different values like 6 and 8 but still the execution time is not comparable with a single threaded python script. What would be the reasonable value for the parallelism?
>>
>> Best,
>>
>> Habib
>>
>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>> The reason might be the parallelism of your task is only 1, that's too low.
>>> See [1] to specify proper parallelism for your job, and the execution time should be reduced significantly.
>>>
>>> [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html <https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html>
>>>
>>> Best Regards,
>>> Zhenghua Gao
>>>
>>>
>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>> Hi all,
>>>
>>> I am running Flink on a standalone cluster and getting very long
>>> execution time for the streaming queries like WordCount for a fixed text
>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>>> it took around two hours to finish counting this file while a simple
>>> python script can do it in around 7 minutes. Just wondering what is
>>> wrong with my setup. I ran the experiments on a cluster with six
>>> taskManagers, but I still get very long execution time like 25 minutes
>>> or so. I tried to increase the JVM heap size to have lower execution
>>> time but it did not help. I attached the log file and the Flink
>>> configuration file to this email.
>>>
>>> Best,
>>>
>>> Habib
>>>
>> --
>> Habib Mostafaei, Ph.D.
>> Postdoctoral researcher
>> TU Berlin,
>> FG INET, MAR 4.003
>> Marchstraße 23, 10587 Berlin
Re: low performance in running queries
Posted by Chris Miller <ch...@gmail.com>.
I haven't run any benchmarks with Flink or even used it enough to
directly help with your question, however I suspect that the following
article might be relevant:
http://dsrg.pdos.csail.mit.edu/2016/06/26/scalability-cost/
Given the computation you're performing is trivial, it's possible that
the additional overhead of serialisation, interprocess communication,
state management etc that distributed systems like Flink require are
dominating the runtime here. 2 hours (or even 25 minutes) still seems
too long to me however, so hopefully it really is just a configuration
issue of some sort. Either way, if you do figure this out or anyone with
good knowledge of the article above in relation to Flink is able to give
their thoughts, I'd be very interested in hearing more.
Regards,
Chris
------ Original Message ------
From: "Habib Mostafaei" <ha...@inet.tu-berlin.de>
To: "Zhenghua Gao" <do...@gmail.com>
Cc: "user" <us...@flink.apache.org>; "Georgios Smaragdakis"
<ge...@inet.tu-berlin.de>; "Niklas Semmler"
<ni...@inet.tu-berlin.de>
Sent: 30/10/2019 12:25:28
Subject: Re: low performance in running queries
>Thanks Gao for the reply. I used the parallelism parameter with
>different values like 6 and 8 but still the execution time is not
>comparable with a single threaded python script. What would be the
>reasonable value for the parallelism?
>
>Best,
>
>Habib
>
>On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>The reason might be the parallelism of your task is only 1, that's too
>>low.
>>See [1] to specify proper parallelism for your job, and the execution
>>time should be reduced significantly.
>>
>>[1]
>>https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>
>>Best Regards,
>>Zhenghua Gao
>>
>>
>>On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
>><ha...@inet.tu-berlin.de> wrote:
>>>Hi all,
>>>
>>>I am running Flink on a standalone cluster and getting very long
>>>execution time for the streaming queries like WordCount for a fixed
>>>text
>>>file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>>have a text file with size of 2GB. When I run the Flink on a
>>>standalone
>>>cluster, i.e., one JobManager and one taskManager with 25GB of
>>>heapsize,
>>>it took around two hours to finish counting this file while a simple
>>>python script can do it in around 7 minutes. Just wondering what is
>>>wrong with my setup. I ran the experiments on a cluster with six
>>>taskManagers, but I still get very long execution time like 25
>>>minutes
>>>or so. I tried to increase the JVM heap size to have lower execution
>>>time but it did not help. I attached the log file and the Flink
>>>configuration file to this email.
>>>
>>>Best,
>>>
>>>Habib
>>>
>--
>Habib Mostafaei, Ph.D.
>Postdoctoral researcher
>TU Berlin,
>FG INET, MAR 4.003
>Marchstraße 23, 10587 Berlin
Re: low performance in running queries
Posted by Habib Mostafaei <ha...@inet.tu-berlin.de>.
Thanks Gao for the reply. I used the parallelism parameter with
different values like 6 and 8 but still the execution time is not
comparable with a single threaded python script. What would be the
reasonable value for the parallelism?
Best,
Habib
On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
> The reason might be the parallelism of your task is only 1, that's too
> low.
> See [1] to specify proper parallelism for your job, and the execution
> time should be reduced significantly.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>
> Hi all,
>
> I am running Flink on a standalone cluster and getting very long
> execution time for the streaming queries like WordCount for a
> fixed text
> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
> have a text file with size of 2GB. When I run the Flink on a
> standalone
> cluster, i.e., one JobManager and one taskManager with 25GB of
> heapsize,
> it took around two hours to finish counting this file while a simple
> python script can do it in around 7 minutes. Just wondering what is
> wrong with my setup. I ran the experiments on a cluster with six
> taskManagers, but I still get very long execution time like 25
> minutes
> or so. I tried to increase the JVM heap size to have lower execution
> time but it did not help. I attached the log file and the Flink
> configuration file to this email.
>
> Best,
>
> Habib
>
--
Habib Mostafaei, Ph.D.
Postdoctoral researcher
TU Berlin,
FG INET, MAR 4.003
Marchstraße 23, 10587 Berlin
Re: low performance in running queries
Posted by Zhenghua Gao <do...@gmail.com>.
The reason might be the parallelism of your task is only 1, that's too low.
See [1] to specify proper parallelism for your job, and the execution time
should be reduced significantly.
[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
*Best Regards,*
*Zhenghua Gao*
On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
wrote:
> Hi all,
>
> I am running Flink on a standalone cluster and getting very long
> execution time for the streaming queries like WordCount for a fixed text
> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
> have a text file with size of 2GB. When I run the Flink on a standalone
> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
> it took around two hours to finish counting this file while a simple
> python script can do it in around 7 minutes. Just wondering what is
> wrong with my setup. I ran the experiments on a cluster with six
> taskManagers, but I still get very long execution time like 25 minutes
> or so. I tried to increase the JVM heap size to have lower execution
> time but it did not help. I attached the log file and the Flink
> configuration file to this email.
>
> Best,
>
> Habib
>
>