You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Zhenghua Gao <do...@gmail.com> on 2019/11/01 05:03:36 UTC

Re: low performance in running queries

2019-10-30 15:59:52,122 INFO
org.apache.flink.runtime.taskmanager.Task                     - Split
Reader: Custom File Source -> Flat Map (1/1)
(6a17c410c3e36f524bb774d2dffed4a4) switched from DEPLOYING to RUNNING.

2019-10-30 17:45:10,943 INFO
org.apache.flink.runtime.taskmanager.Task                     - Split
Reader: Custom File Source -> Flat Map (1/1)
(6a17c410c3e36f524bb774d2dffed4a4) switched from RUNNING to FINISHED.


It's surprise that the source task uses 95 mins to read a 2G file.

Could you give me your code snippets and some sample lines of the 2G file?

I will try to reproduce your scenario and dig the root causes.


*Best Regards,*
*Zhenghua Gao*


On Thu, Oct 31, 2019 at 9:05 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
wrote:

> I enclosed all logs from the run and for this run I used parallelism one.
> However, for other runs I checked and found that all parallel workers were
> working properly. Is there a simple way to get profiling information in
> Flink?
>
> Best,
>
> Habib
> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>
> I think more runtime information would help figure out where the problem
>  is.
> 1) how many parallelisms actually working
> 2) the metrics for each operator
> 3) the jvm profiling information, etc
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
> wrote:
>
>> Thanks Gao for the reply. I used the parallelism parameter with different
>> values like 6 and 8 but still the execution time is not comparable with a
>> single threaded python script. What would be the reasonable value for the
>> parallelism?
>>
>> Best,
>>
>> Habib
>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>
>> The reason might be the parallelism of your task is only 1, that's too
>> low.
>> See [1] to specify proper parallelism  for your job, and the execution
>> time should be reduced significantly.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am running Flink on a standalone cluster and getting very long
>>> execution time for the streaming queries like WordCount for a fixed text
>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>> cluster, i.e., one JobManager and one taskManager with 25GB of heapsize,
>>> it took around two hours to finish counting this file while a simple
>>> python script can do it in around 7 minutes. Just wondering what is
>>> wrong with my setup. I ran the experiments on a cluster with six
>>> taskManagers, but I still get very long execution time like 25 minutes
>>> or so. I tried to increase the JVM heap size to have lower execution
>>> time but it did not help. I attached the log file and the Flink
>>> configuration file to this email.
>>>
>>> Best,
>>>
>>> Habib
>>>
>>>
>

Re: low performance in running queries

Posted by Zhenghua Gao <do...@gmail.com>.

Hi,

I ran the streaming WordCount with a 2GB text file(copied
/usr/share/dict/words 400 times) last weekend and didn't reproduce your
result(16 minutes in my case).
But i find some clues may help you:

The streaming WordCount job would output all intermedia result in your
output file(if specified) or taskmanager.out.
It's large (about 4GB in my case) and causes the disk writes high.


*Best Regards,*
*Zhenghua Gao*


On Fri, Nov 1, 2019 at 4:40 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
wrote:

> I used streaming WordCount provided by Flink and the file contains text
> like "This is some text...". I just copied several times.
>
> Best,
>
> Habib
> On 11/1/2019 6:03 AM, Zhenghua Gao wrote:
>
> 2019-10-30 15:59:52,122 INFO  org.apache.flink.runtime.taskmanager.Task                     - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from DEPLOYING to RUNNING.
>
> 2019-10-30 17:45:10,943 INFO  org.apache.flink.runtime.taskmanager.Task                     - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from RUNNING to FINISHED.
>
> It's surprise that the source task uses 95 mins to read a 2G file.
>
> Could you give me your code snippets and some sample lines of the 2G file?
>
> I will try to reproduce your scenario and dig the root causes.
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Thu, Oct 31, 2019 at 9:05 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
> wrote:
>
>> I enclosed all logs from the run and for this run I used parallelism one.
>> However, for other runs I checked and found that all parallel workers were
>> working properly. Is there a simple way to get profiling information in
>> Flink?
>>
>> Best,
>>
>> Habib
>> On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>
>> I think more runtime information would help figure out where the problem
>>  is.
>> 1) how many parallelisms actually working
>> 2) the metrics for each operator
>> 3) the jvm profiling information, etc
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
>> wrote:
>>
>>> Thanks Gao for the reply. I used the parallelism parameter with
>>> different values like 6 and 8 but still the execution time is not
>>> comparable with a single threaded python script. What would be the
>>> reasonable value for the parallelism?
>>>
>>> Best,
>>>
>>> Habib
>>> On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>
>>> The reason might be the parallelism of your task is only 1, that's too
>>> low.
>>> See [1] to specify proper parallelism  for your job, and the execution
>>> time should be reduced significantly.
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>>
>>> *Best Regards,*
>>> *Zhenghua Gao*
>>>
>>>
>>> On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei <ha...@inet.tu-berlin.de>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am running Flink on a standalone cluster and getting very long
>>>> execution time for the streaming queries like WordCount for a fixed
>>>> text
>>>> file. My VM runs on a Debian 10 with 16 cpu cores and 32GB of RAM. I
>>>> have a text file with size of 2GB. When I run the Flink on a standalone
>>>> cluster, i.e., one JobManager and one taskManager with 25GB of
>>>> heapsize,
>>>> it took around two hours to finish counting this file while a simple
>>>> python script can do it in around 7 minutes. Just wondering what is
>>>> wrong with my setup. I ran the experiments on a cluster with six
>>>> taskManagers, but I still get very long execution time like 25 minutes
>>>> or so. I tried to increase the JVM heap size to have lower execution
>>>> time but it did not help. I attached the log file and the Flink
>>>> configuration file to this email.
>>>>
>>>> Best,
>>>>
>>>> Habib
>>>>
>>>>
>>

Re: low performance in running queries

Posted by Habib Mostafaei <ha...@inet.tu-berlin.de>.

I used streaming WordCount provided by Flink and the file contains text 
like "This is some text...". I just copied several times.

Best,

Habib

On 11/1/2019 6:03 AM, Zhenghua Gao wrote:
> 2019-10-30 15:59:52,122 INFO  org.apache.flink.runtime.taskmanager.Task                     - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from DEPLOYING to RUNNING.
> 2019-10-30 17:45:10,943 INFO  org.apache.flink.runtime.taskmanager.Task                     - Split Reader: Custom File Source -> Flat Map (1/1) (6a17c410c3e36f524bb774d2dffed4a4) switched from RUNNING to FINISHED.
> It's surprise that the source task uses 95 mins to read a 2G file.
> Could you give me your code snippets and some sample lines of the 2G file?
> I will try to reproduce your scenario and dig the root causes.
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Thu, Oct 31, 2019 at 9:05 PM Habib Mostafaei 
> <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>
>     I enclosed all logs from the run and for this run I used
>     parallelism one. However, for other runs I checked and found that
>     all parallel workers were working properly. Is there a simple way
>     to get profiling information in Flink?
>
>     Best,
>
>     Habib
>
>     On 10/31/2019 2:54 AM, Zhenghua Gao wrote:
>>     I think more runtime information would help figure
>>     outwheretheproblem is.
>>     1) how many parallelisms actually working
>>     2) the metrics for each operator
>>     3) the jvm profiling information, etc
>>
>>     *Best Regards,*
>>     *Zhenghua Gao*
>>
>>
>>     On Wed, Oct 30, 2019 at 8:25 PM Habib Mostafaei
>>     <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>> wrote:
>>
>>         Thanks Gao for the reply. I used the parallelism parameter
>>         with different values like 6 and 8 but still the execution
>>         time is not comparable with a single threaded python script.
>>         What would be the reasonable value for the parallelism?
>>
>>         Best,
>>
>>         Habib
>>
>>         On 10/30/2019 1:17 PM, Zhenghua Gao wrote:
>>>         The reason might be the parallelism of your task is only 1,
>>>         that's too low.
>>>         See [1] to specify proper parallelism  for your job, and the
>>>         execution time should be reduced significantly.
>>>
>>>         [1]
>>>         https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html
>>>
>>>         *Best Regards,*
>>>         *Zhenghua Gao*
>>>
>>>
>>>         On Tue, Oct 29, 2019 at 9:27 PM Habib Mostafaei
>>>         <habib@inet.tu-berlin.de <ma...@inet.tu-berlin.de>>
>>>         wrote:
>>>
>>>             Hi all,
>>>
>>>             I am running Flink on a standalone cluster and getting
>>>             very long
>>>             execution time for the streaming queries like WordCount
>>>             for a fixed text
>>>             file. My VM runs on a Debian 10 with 16 cpu cores and
>>>             32GB of RAM. I
>>>             have a text file with size of 2GB. When I run the Flink
>>>             on a standalone
>>>             cluster, i.e., one JobManager and one taskManager with
>>>             25GB of heapsize,
>>>             it took around two hours to finish counting this file
>>>             while a simple
>>>             python script can do it in around 7 minutes. Just
>>>             wondering what is
>>>             wrong with my setup. I ran the experiments on a cluster
>>>             with six
>>>             taskManagers, but I still get very long execution time
>>>             like 25 minutes
>>>             or so. I tried to increase the JVM heap size to have
>>>             lower execution
>>>             time but it did not help. I attached the log file and
>>>             the Flink
>>>             configuration file to this email.
>>>
>>>             Best,
>>>
>>>             Habib
>>>
>