You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Eric Fukuda <e....@gmail.com> on 2016/09/02 15:09:03 UTC

Re: How to get latency info from benchmark

Hi Robert,

I've been trying to build the "performance" project using various versions
of Flink, but failing. It seems that I need both KafkaZKStringSerializer
class and FlinkKafkaConsumer082 class to build the project, but none of the
branches has both of them. KafkaZKStringSerializer existed in 0.9.0-x
branches but deleted in 0.9.1-x branches, and FlinkKafkaConsumer082 goes
the other way, therefore they don't exist in a same branch. I'm guessing
you were using a snapshot somewhere between 0.9.0 and 0.9.1. Could you tell
me the SHA you were using?

Regards,
Eric


On Wed, Aug 24, 2016 at 4:57 PM, Robert Metzger <rm...@apache.org> wrote:

> Hi,
>
> Version 0.10-SNAPSHOT is pretty old. The snapshot repository of Apache
> probably doesn't keep old artifacts around forever.
> Maybe you can migrate the tests to Flink 0.10.0, or maybe even to a higher
> version.
>
> Regards,
> Robert
>
> On Wed, Aug 24, 2016 at 10:32 PM, Eric Fukuda <e....@gmail.com>
> wrote:
>
>> Hi Max, Robert,
>>
>> Thanks for the advice. I'm trying to build the "performance" project, but
>> failing with the following error. Is there a solution for this?
>>
>> [ERROR] Failed to execute goal on project streaming-state-demo: Could not
>> resolve dependencies for project com.dataartisans.flink:streami
>> ng-state-demo:jar:1.0-SNAPSHOT: Failure to find
>> org.apache.flink:flink-connector-kafka-083:jar:0.10-SNAPSHOT in
>> https://repository.apache.org/content/repositories/snapshots/ was cached
>> in the local repository, resolution will not be reattempted until the
>> update interval of apache.snapshots has elapsed or updates are forced ->
>> [Help 1]
>>
>>
>>
>>
>> On Wed, Aug 24, 2016 at 8:12 AM, Robert Metzger <rm...@apache.org>
>> wrote:
>>
>>> Hi Eric,
>>>
>>> Max is right, the tool has been used for a different benchmark [1]. The
>>> throughput logger that should produce the right output is this one [2].
>>> Very recently, I've opened a pull request for adding metric-measuring
>>> support into the engine [3]. Maybe that's helpful for your experiments.
>>>
>>>
>>> [1] http://data-artisans.com/high-throughput-low-latency-and
>>> -exactly-once-stream-processing-with-apache-flink/
>>> [2] https://github.com/dataArtisans/performance/blob/master/
>>> flink-jobs/src/main/java/com/github/projectflink/streaming/T
>>> hroughput.java#L203
>>> [3] https://github.com/apache/flink/pull/2386
>>>
>>>
>>>
>>> On Wed, Aug 24, 2016 at 2:04 PM, Maximilian Michels <mx...@apache.org>
>>> wrote:
>>>
>>>> I believe the AnaylzeTool is for processing logs of a different
>>>> benchmark.
>>>>
>>>> CC Jamie and Robert who worked on the benchmark.
>>>>
>>>> On Wed, Aug 24, 2016 at 3:25 AM, Eric Fukuda <e....@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I'm trying to benchmark Flink without Kafka as mentioned in this post
>>>> > (http://data-artisans.com/extending-the-yahoo-streaming-benchmark/).
>>>> After
>>>> > running flink.benchmark.state.AdvertisingTopologyFlinkState with
>>>> > user.local.event.generator in localConf.yaml set to 1, I ran
>>>> > flink.benchmark.utils.AnalyzeTool giving
>>>> > flink-1.0.1/log/flink-[username]-jobmanager-0-[servername].log as a
>>>> > command-line argument. I got the following output and it does not
>>>> have the
>>>> > information about the latency.
>>>> >
>>>> >
>>>> > ================= Latency (0 reports ) =====================
>>>> > ================= Throughput (1 reports ) =====================
>>>> > ====== null (entries: 10150)=======
>>>> > Mean throughput 639078.5018497099
>>>> > Exception in thread "main" java.lang.IndexOutOfBoundsException:
>>>> toIndex = 2
>>>> >         at java.util.ArrayList.subListRangeCheck(ArrayList.java:962)
>>>> >         at java.util.ArrayList.subList(ArrayList.java:954)
>>>> >         at flink.benchmark.utils.AnalyzeT
>>>> ool.main(AnalyzeTool.java:133)
>>>> >
>>>> >
>>>> > Reading the code in AnalyzeTool.java, I found that it's looking for
>>>> lines
>>>> > that include "Latency" in the log file, but apparently it's not
>>>> finding any.
>>>> > I tried grepping the log file, and couldn't find any either. I have
>>>> one
>>>> > server that runs both JobManager and Task Manager and another server
>>>> that
>>>> > runs Redis, and they are connected through a network with each other.
>>>> >
>>>> > I think I have to do something to read the data stored in Redis before
>>>> > running AnalyzeTool, but can't figure out what. Does anyone know how
>>>> to get
>>>> > the latency information?
>>>> >
>>>> > Thanks,
>>>> > Eric
>>>>
>>>
>>>
>>
>

Re: How to get latency info from benchmark

Posted by Eric Fukuda <e....@gmail.com>.
I got the same error with this commit too. Weird :-( I will try picking the
necessary classes. Thanks anyway.

On Sat, Sep 3, 2016 at 7:41 AM, Robert Metzger <rm...@apache.org> wrote:

> I also can't checkout the commit locally... which is weird, because GitHub
> still seems to be able to somehow access it.
>
> Can you try this commit: df42160832ff65ae2a85b478d1dd0b398fa6ef3f ?
>
> I actually believe its probably easier to just pick the classes you need
> from the "benchmark" repository and fit them to the current code base.
>
> On Fri, Sep 2, 2016 at 9:12 PM, Eric Fukuda <e....@gmail.com> wrote:
>
>> Thanks Robert,
>>
>> I tried to checkout the commit you mentioned, but git returns an error
>> "fatal: reference if not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e".
>> I've searched for a solution but could not find any. Am I doing something
>> wrong?
>>
>> -----------------
>> $ git clone https://github.com/rmetzger/flink.git
>> Cloning into 'flink'...
>> remote: Counting objects: 321185, done.
>> remote: Compressing objects: 100% (3/3), done.
>> remote: Total 321185 (delta 1), reused 0 (delta 0), pack-reused 321182
>> Receiving objects: 100% (321185/321185), 93.60 MiB | 10.63 MiB/s, done.
>> Resolving deltas: 100% (141424/141424), done.
>> Checking connectivity... done.
>> $ cd flink/
>> $ git checkout 547e7490fb99562ca15a2127f0ce1e784db97f3e
>> fatal: reference is not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e
>> ------------------
>>
>> Regards,
>> Eric
>>
>> On Fri, Sep 2, 2016 at 12:01 PM, Robert Metzger <rm...@apache.org>
>> wrote:
>>
>>> Hi Eric,
>>>
>>> I'm sorry that you are running into these issues. I think the version is
>>> 0.10-SNAPSHOT, and I think I've used this commit: https://github.com/rme
>>> tzger/flink/commit/547e749 for some of the runs (of the throughput /
>>> latency tests, not for the yahoo benchmark). The commit should at least
>>> point to the right point in time.
>>> Note that these benchmarks are pretty old by now, and the performance
>>> characteristics have probably changed in Flink 1.1 because we've put a lot
>>> of effort into optimizing Flink for common streaming use cases.
>>>
>>> Regards,
>>> Robert
>>>
>>>
>>> On Fri, Sep 2, 2016 at 5:09 PM, Eric Fukuda <e....@gmail.com>
>>> wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> I've been trying to build the "performance" project using various
>>>> versions of Flink, but failing. It seems that I need both
>>>> KafkaZKStringSerializer class and FlinkKafkaConsumer082 class to build the
>>>> project, but none of the branches has both of them. KafkaZKStringSerializer
>>>> existed in 0.9.0-x branches but deleted in 0.9.1-x branches, and
>>>> FlinkKafkaConsumer082 goes the other way, therefore they don't exist in a
>>>> same branch. I'm guessing you were using a snapshot somewhere between 0.9.0
>>>> and 0.9.1. Could you tell me the SHA you were using?
>>>>
>>>> Regards,
>>>> Eric
>>>>
>>>>
>>>> On Wed, Aug 24, 2016 at 4:57 PM, Robert Metzger <rm...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Version 0.10-SNAPSHOT is pretty old. The snapshot repository of Apache
>>>>> probably doesn't keep old artifacts around forever.
>>>>> Maybe you can migrate the tests to Flink 0.10.0, or maybe even to a
>>>>> higher version.
>>>>>
>>>>> Regards,
>>>>> Robert
>>>>>
>>>>> On Wed, Aug 24, 2016 at 10:32 PM, Eric Fukuda <e....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Max, Robert,
>>>>>>
>>>>>> Thanks for the advice. I'm trying to build the "performance" project,
>>>>>> but failing with the following error. Is there a solution for this?
>>>>>>
>>>>>> [ERROR] Failed to execute goal on project streaming-state-demo: Could
>>>>>> not resolve dependencies for project com.dataartisans.flink:streami
>>>>>> ng-state-demo:jar:1.0-SNAPSHOT: Failure to find
>>>>>> org.apache.flink:flink-connector-kafka-083:jar:0.10-SNAPSHOT in
>>>>>> https://repository.apache.org/content/repositories/snapshots/ was
>>>>>> cached in the local repository, resolution will not be reattempted until
>>>>>> the update interval of apache.snapshots has elapsed or updates are forced
>>>>>> -> [Help 1]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 24, 2016 at 8:12 AM, Robert Metzger <rm...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Max is right, the tool has been used for a different benchmark [1].
>>>>>>> The throughput logger that should produce the right output is this one [2].
>>>>>>> Very recently, I've opened a pull request for adding
>>>>>>> metric-measuring support into the engine [3]. Maybe that's helpful for your
>>>>>>> experiments.
>>>>>>>
>>>>>>>
>>>>>>> [1] http://data-artisans.com/high-throughput-low-latency-and
>>>>>>> -exactly-once-stream-processing-with-apache-flink/
>>>>>>> [2] https://github.com/dataArtisans/performance/blob/master/
>>>>>>> flink-jobs/src/main/java/com/github/projectflink/streaming/T
>>>>>>> hroughput.java#L203
>>>>>>> [3] https://github.com/apache/flink/pull/2386
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 24, 2016 at 2:04 PM, Maximilian Michels <mx...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I believe the AnaylzeTool is for processing logs of a different
>>>>>>>> benchmark.
>>>>>>>>
>>>>>>>> CC Jamie and Robert who worked on the benchmark.
>>>>>>>>
>>>>>>>> On Wed, Aug 24, 2016 at 3:25 AM, Eric Fukuda <e....@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Hi,
>>>>>>>> >
>>>>>>>> > I'm trying to benchmark Flink without Kafka as mentioned in this
>>>>>>>> post
>>>>>>>> > (http://data-artisans.com/extending-the-yahoo-streaming-benc
>>>>>>>> hmark/). After
>>>>>>>> > running flink.benchmark.state.AdvertisingTopologyFlinkState with
>>>>>>>> > user.local.event.generator in localConf.yaml set to 1, I ran
>>>>>>>> > flink.benchmark.utils.AnalyzeTool giving
>>>>>>>> > flink-1.0.1/log/flink-[username]-jobmanager-0-[servername].log
>>>>>>>> as a
>>>>>>>> > command-line argument. I got the following output and it does not
>>>>>>>> have the
>>>>>>>> > information about the latency.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > ================= Latency (0 reports ) =====================
>>>>>>>> > ================= Throughput (1 reports ) =====================
>>>>>>>> > ====== null (entries: 10150)=======
>>>>>>>> > Mean throughput 639078.5018497099
>>>>>>>> > Exception in thread "main" java.lang.IndexOutOfBoundsException:
>>>>>>>> toIndex = 2
>>>>>>>> >         at java.util.ArrayList.subListRan
>>>>>>>> geCheck(ArrayList.java:962)
>>>>>>>> >         at java.util.ArrayList.subList(ArrayList.java:954)
>>>>>>>> >         at flink.benchmark.utils.AnalyzeT
>>>>>>>> ool.main(AnalyzeTool.java:133)
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Reading the code in AnalyzeTool.java, I found that it's looking
>>>>>>>> for lines
>>>>>>>> > that include "Latency" in the log file, but apparently it's not
>>>>>>>> finding any.
>>>>>>>> > I tried grepping the log file, and couldn't find any either. I
>>>>>>>> have one
>>>>>>>> > server that runs both JobManager and Task Manager and another
>>>>>>>> server that
>>>>>>>> > runs Redis, and they are connected through a network with each
>>>>>>>> other.
>>>>>>>> >
>>>>>>>> > I think I have to do something to read the data stored in Redis
>>>>>>>> before
>>>>>>>> > running AnalyzeTool, but can't figure out what. Does anyone know
>>>>>>>> how to get
>>>>>>>> > the latency information?
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Eric
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to get latency info from benchmark

Posted by Robert Metzger <rm...@apache.org>.
I also can't checkout the commit locally... which is weird, because GitHub
still seems to be able to somehow access it.

Can you try this commit: df42160832ff65ae2a85b478d1dd0b398fa6ef3f ?

I actually believe its probably easier to just pick the classes you need
from the "benchmark" repository and fit them to the current code base.

On Fri, Sep 2, 2016 at 9:12 PM, Eric Fukuda <e....@gmail.com> wrote:

> Thanks Robert,
>
> I tried to checkout the commit you mentioned, but git returns an error
> "fatal: reference if not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e".
> I've searched for a solution but could not find any. Am I doing something
> wrong?
>
> -----------------
> $ git clone https://github.com/rmetzger/flink.git
> Cloning into 'flink'...
> remote: Counting objects: 321185, done.
> remote: Compressing objects: 100% (3/3), done.
> remote: Total 321185 (delta 1), reused 0 (delta 0), pack-reused 321182
> Receiving objects: 100% (321185/321185), 93.60 MiB | 10.63 MiB/s, done.
> Resolving deltas: 100% (141424/141424), done.
> Checking connectivity... done.
> $ cd flink/
> $ git checkout 547e7490fb99562ca15a2127f0ce1e784db97f3e
> fatal: reference is not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e
> ------------------
>
> Regards,
> Eric
>
> On Fri, Sep 2, 2016 at 12:01 PM, Robert Metzger <rm...@apache.org>
> wrote:
>
>> Hi Eric,
>>
>> I'm sorry that you are running into these issues. I think the version is
>> 0.10-SNAPSHOT, and I think I've used this commit: https://github.com/rme
>> tzger/flink/commit/547e749 for some of the runs (of the throughput /
>> latency tests, not for the yahoo benchmark). The commit should at least
>> point to the right point in time.
>> Note that these benchmarks are pretty old by now, and the performance
>> characteristics have probably changed in Flink 1.1 because we've put a lot
>> of effort into optimizing Flink for common streaming use cases.
>>
>> Regards,
>> Robert
>>
>>
>> On Fri, Sep 2, 2016 at 5:09 PM, Eric Fukuda <e....@gmail.com> wrote:
>>
>>> Hi Robert,
>>>
>>> I've been trying to build the "performance" project using various
>>> versions of Flink, but failing. It seems that I need both
>>> KafkaZKStringSerializer class and FlinkKafkaConsumer082 class to build the
>>> project, but none of the branches has both of them. KafkaZKStringSerializer
>>> existed in 0.9.0-x branches but deleted in 0.9.1-x branches, and
>>> FlinkKafkaConsumer082 goes the other way, therefore they don't exist in a
>>> same branch. I'm guessing you were using a snapshot somewhere between 0.9.0
>>> and 0.9.1. Could you tell me the SHA you were using?
>>>
>>> Regards,
>>> Eric
>>>
>>>
>>> On Wed, Aug 24, 2016 at 4:57 PM, Robert Metzger <rm...@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Version 0.10-SNAPSHOT is pretty old. The snapshot repository of Apache
>>>> probably doesn't keep old artifacts around forever.
>>>> Maybe you can migrate the tests to Flink 0.10.0, or maybe even to a
>>>> higher version.
>>>>
>>>> Regards,
>>>> Robert
>>>>
>>>> On Wed, Aug 24, 2016 at 10:32 PM, Eric Fukuda <e....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Max, Robert,
>>>>>
>>>>> Thanks for the advice. I'm trying to build the "performance" project,
>>>>> but failing with the following error. Is there a solution for this?
>>>>>
>>>>> [ERROR] Failed to execute goal on project streaming-state-demo: Could
>>>>> not resolve dependencies for project com.dataartisans.flink:streami
>>>>> ng-state-demo:jar:1.0-SNAPSHOT: Failure to find
>>>>> org.apache.flink:flink-connector-kafka-083:jar:0.10-SNAPSHOT in
>>>>> https://repository.apache.org/content/repositories/snapshots/ was
>>>>> cached in the local repository, resolution will not be reattempted until
>>>>> the update interval of apache.snapshots has elapsed or updates are forced
>>>>> -> [Help 1]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 24, 2016 at 8:12 AM, Robert Metzger <rm...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> Max is right, the tool has been used for a different benchmark [1].
>>>>>> The throughput logger that should produce the right output is this one [2].
>>>>>> Very recently, I've opened a pull request for adding metric-measuring
>>>>>> support into the engine [3]. Maybe that's helpful for your experiments.
>>>>>>
>>>>>>
>>>>>> [1] http://data-artisans.com/high-throughput-low-latency-and
>>>>>> -exactly-once-stream-processing-with-apache-flink/
>>>>>> [2] https://github.com/dataArtisans/performance/blob/master/
>>>>>> flink-jobs/src/main/java/com/github/projectflink/streaming/T
>>>>>> hroughput.java#L203
>>>>>> [3] https://github.com/apache/flink/pull/2386
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 24, 2016 at 2:04 PM, Maximilian Michels <mx...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I believe the AnaylzeTool is for processing logs of a different
>>>>>>> benchmark.
>>>>>>>
>>>>>>> CC Jamie and Robert who worked on the benchmark.
>>>>>>>
>>>>>>> On Wed, Aug 24, 2016 at 3:25 AM, Eric Fukuda <e....@gmail.com>
>>>>>>> wrote:
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I'm trying to benchmark Flink without Kafka as mentioned in this
>>>>>>> post
>>>>>>> > (http://data-artisans.com/extending-the-yahoo-streaming-benchmark/).
>>>>>>> After
>>>>>>> > running flink.benchmark.state.AdvertisingTopologyFlinkState with
>>>>>>> > user.local.event.generator in localConf.yaml set to 1, I ran
>>>>>>> > flink.benchmark.utils.AnalyzeTool giving
>>>>>>> > flink-1.0.1/log/flink-[username]-jobmanager-0-[servername].log as
>>>>>>> a
>>>>>>> > command-line argument. I got the following output and it does not
>>>>>>> have the
>>>>>>> > information about the latency.
>>>>>>> >
>>>>>>> >
>>>>>>> > ================= Latency (0 reports ) =====================
>>>>>>> > ================= Throughput (1 reports ) =====================
>>>>>>> > ====== null (entries: 10150)=======
>>>>>>> > Mean throughput 639078.5018497099
>>>>>>> > Exception in thread "main" java.lang.IndexOutOfBoundsException:
>>>>>>> toIndex = 2
>>>>>>> >         at java.util.ArrayList.subListRan
>>>>>>> geCheck(ArrayList.java:962)
>>>>>>> >         at java.util.ArrayList.subList(ArrayList.java:954)
>>>>>>> >         at flink.benchmark.utils.AnalyzeT
>>>>>>> ool.main(AnalyzeTool.java:133)
>>>>>>> >
>>>>>>> >
>>>>>>> > Reading the code in AnalyzeTool.java, I found that it's looking
>>>>>>> for lines
>>>>>>> > that include "Latency" in the log file, but apparently it's not
>>>>>>> finding any.
>>>>>>> > I tried grepping the log file, and couldn't find any either. I
>>>>>>> have one
>>>>>>> > server that runs both JobManager and Task Manager and another
>>>>>>> server that
>>>>>>> > runs Redis, and they are connected through a network with each
>>>>>>> other.
>>>>>>> >
>>>>>>> > I think I have to do something to read the data stored in Redis
>>>>>>> before
>>>>>>> > running AnalyzeTool, but can't figure out what. Does anyone know
>>>>>>> how to get
>>>>>>> > the latency information?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Eric
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to get latency info from benchmark

Posted by Eric Fukuda <e....@gmail.com>.
Thanks Robert,

I tried to checkout the commit you mentioned, but git returns an error
"fatal: reference if not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e".
I've searched for a solution but could not find any. Am I doing something
wrong?

-----------------
$ git clone https://github.com/rmetzger/flink.git
Cloning into 'flink'...
remote: Counting objects: 321185, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 321185 (delta 1), reused 0 (delta 0), pack-reused 321182
Receiving objects: 100% (321185/321185), 93.60 MiB | 10.63 MiB/s, done.
Resolving deltas: 100% (141424/141424), done.
Checking connectivity... done.
$ cd flink/
$ git checkout 547e7490fb99562ca15a2127f0ce1e784db97f3e
fatal: reference is not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e
------------------

Regards,
Eric

On Fri, Sep 2, 2016 at 12:01 PM, Robert Metzger <rm...@apache.org> wrote:

> Hi Eric,
>
> I'm sorry that you are running into these issues. I think the version is
> 0.10-SNAPSHOT, and I think I've used this commit: https://github.com/
> rmetzger/flink/commit/547e749 for some of the runs (of the throughput /
> latency tests, not for the yahoo benchmark). The commit should at least
> point to the right point in time.
> Note that these benchmarks are pretty old by now, and the performance
> characteristics have probably changed in Flink 1.1 because we've put a lot
> of effort into optimizing Flink for common streaming use cases.
>
> Regards,
> Robert
>
>
> On Fri, Sep 2, 2016 at 5:09 PM, Eric Fukuda <e....@gmail.com> wrote:
>
>> Hi Robert,
>>
>> I've been trying to build the "performance" project using various
>> versions of Flink, but failing. It seems that I need both
>> KafkaZKStringSerializer class and FlinkKafkaConsumer082 class to build the
>> project, but none of the branches has both of them. KafkaZKStringSerializer
>> existed in 0.9.0-x branches but deleted in 0.9.1-x branches, and
>> FlinkKafkaConsumer082 goes the other way, therefore they don't exist in a
>> same branch. I'm guessing you were using a snapshot somewhere between 0.9.0
>> and 0.9.1. Could you tell me the SHA you were using?
>>
>> Regards,
>> Eric
>>
>>
>> On Wed, Aug 24, 2016 at 4:57 PM, Robert Metzger <rm...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> Version 0.10-SNAPSHOT is pretty old. The snapshot repository of Apache
>>> probably doesn't keep old artifacts around forever.
>>> Maybe you can migrate the tests to Flink 0.10.0, or maybe even to a
>>> higher version.
>>>
>>> Regards,
>>> Robert
>>>
>>> On Wed, Aug 24, 2016 at 10:32 PM, Eric Fukuda <e....@gmail.com>
>>> wrote:
>>>
>>>> Hi Max, Robert,
>>>>
>>>> Thanks for the advice. I'm trying to build the "performance" project,
>>>> but failing with the following error. Is there a solution for this?
>>>>
>>>> [ERROR] Failed to execute goal on project streaming-state-demo: Could
>>>> not resolve dependencies for project com.dataartisans.flink:streami
>>>> ng-state-demo:jar:1.0-SNAPSHOT: Failure to find
>>>> org.apache.flink:flink-connector-kafka-083:jar:0.10-SNAPSHOT in
>>>> https://repository.apache.org/content/repositories/snapshots/ was
>>>> cached in the local repository, resolution will not be reattempted until
>>>> the update interval of apache.snapshots has elapsed or updates are forced
>>>> -> [Help 1]
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Aug 24, 2016 at 8:12 AM, Robert Metzger <rm...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Eric,
>>>>>
>>>>> Max is right, the tool has been used for a different benchmark [1].
>>>>> The throughput logger that should produce the right output is this one [2].
>>>>> Very recently, I've opened a pull request for adding metric-measuring
>>>>> support into the engine [3]. Maybe that's helpful for your experiments.
>>>>>
>>>>>
>>>>> [1] http://data-artisans.com/high-throughput-low-latency-and
>>>>> -exactly-once-stream-processing-with-apache-flink/
>>>>> [2] https://github.com/dataArtisans/performance/blob/master/
>>>>> flink-jobs/src/main/java/com/github/projectflink/streaming/T
>>>>> hroughput.java#L203
>>>>> [3] https://github.com/apache/flink/pull/2386
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 24, 2016 at 2:04 PM, Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I believe the AnaylzeTool is for processing logs of a different
>>>>>> benchmark.
>>>>>>
>>>>>> CC Jamie and Robert who worked on the benchmark.
>>>>>>
>>>>>> On Wed, Aug 24, 2016 at 3:25 AM, Eric Fukuda <e....@gmail.com>
>>>>>> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I'm trying to benchmark Flink without Kafka as mentioned in this
>>>>>> post
>>>>>> > (http://data-artisans.com/extending-the-yahoo-streaming-benchmark/).
>>>>>> After
>>>>>> > running flink.benchmark.state.AdvertisingTopologyFlinkState with
>>>>>> > user.local.event.generator in localConf.yaml set to 1, I ran
>>>>>> > flink.benchmark.utils.AnalyzeTool giving
>>>>>> > flink-1.0.1/log/flink-[username]-jobmanager-0-[servername].log as a
>>>>>> > command-line argument. I got the following output and it does not
>>>>>> have the
>>>>>> > information about the latency.
>>>>>> >
>>>>>> >
>>>>>> > ================= Latency (0 reports ) =====================
>>>>>> > ================= Throughput (1 reports ) =====================
>>>>>> > ====== null (entries: 10150)=======
>>>>>> > Mean throughput 639078.5018497099
>>>>>> > Exception in thread "main" java.lang.IndexOutOfBoundsException:
>>>>>> toIndex = 2
>>>>>> >         at java.util.ArrayList.subListRan
>>>>>> geCheck(ArrayList.java:962)
>>>>>> >         at java.util.ArrayList.subList(ArrayList.java:954)
>>>>>> >         at flink.benchmark.utils.AnalyzeT
>>>>>> ool.main(AnalyzeTool.java:133)
>>>>>> >
>>>>>> >
>>>>>> > Reading the code in AnalyzeTool.java, I found that it's looking for
>>>>>> lines
>>>>>> > that include "Latency" in the log file, but apparently it's not
>>>>>> finding any.
>>>>>> > I tried grepping the log file, and couldn't find any either. I have
>>>>>> one
>>>>>> > server that runs both JobManager and Task Manager and another
>>>>>> server that
>>>>>> > runs Redis, and they are connected through a network with each
>>>>>> other.
>>>>>> >
>>>>>> > I think I have to do something to read the data stored in Redis
>>>>>> before
>>>>>> > running AnalyzeTool, but can't figure out what. Does anyone know
>>>>>> how to get
>>>>>> > the latency information?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Eric
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to get latency info from benchmark

Posted by Robert Metzger <rm...@apache.org>.
Hi Eric,

I'm sorry that you are running into these issues. I think the version is
0.10-SNAPSHOT, and I think I've used this commit:
https://github.com/rmetzger/flink/commit/547e749 for some of the runs (of
the throughput / latency tests, not for the yahoo benchmark). The commit
should at least point to the right point in time.
Note that these benchmarks are pretty old by now, and the performance
characteristics have probably changed in Flink 1.1 because we've put a lot
of effort into optimizing Flink for common streaming use cases.

Regards,
Robert


On Fri, Sep 2, 2016 at 5:09 PM, Eric Fukuda <e....@gmail.com> wrote:

> Hi Robert,
>
> I've been trying to build the "performance" project using various versions
> of Flink, but failing. It seems that I need both KafkaZKStringSerializer
> class and FlinkKafkaConsumer082 class to build the project, but none of the
> branches has both of them. KafkaZKStringSerializer existed in 0.9.0-x
> branches but deleted in 0.9.1-x branches, and FlinkKafkaConsumer082 goes
> the other way, therefore they don't exist in a same branch. I'm guessing
> you were using a snapshot somewhere between 0.9.0 and 0.9.1. Could you tell
> me the SHA you were using?
>
> Regards,
> Eric
>
>
> On Wed, Aug 24, 2016 at 4:57 PM, Robert Metzger <rm...@apache.org>
> wrote:
>
>> Hi,
>>
>> Version 0.10-SNAPSHOT is pretty old. The snapshot repository of Apache
>> probably doesn't keep old artifacts around forever.
>> Maybe you can migrate the tests to Flink 0.10.0, or maybe even to a
>> higher version.
>>
>> Regards,
>> Robert
>>
>> On Wed, Aug 24, 2016 at 10:32 PM, Eric Fukuda <e....@gmail.com>
>> wrote:
>>
>>> Hi Max, Robert,
>>>
>>> Thanks for the advice. I'm trying to build the "performance" project,
>>> but failing with the following error. Is there a solution for this?
>>>
>>> [ERROR] Failed to execute goal on project streaming-state-demo: Could
>>> not resolve dependencies for project com.dataartisans.flink:streami
>>> ng-state-demo:jar:1.0-SNAPSHOT: Failure to find
>>> org.apache.flink:flink-connector-kafka-083:jar:0.10-SNAPSHOT in
>>> https://repository.apache.org/content/repositories/snapshots/ was
>>> cached in the local repository, resolution will not be reattempted until
>>> the update interval of apache.snapshots has elapsed or updates are forced
>>> -> [Help 1]
>>>
>>>
>>>
>>>
>>> On Wed, Aug 24, 2016 at 8:12 AM, Robert Metzger <rm...@apache.org>
>>> wrote:
>>>
>>>> Hi Eric,
>>>>
>>>> Max is right, the tool has been used for a different benchmark [1]. The
>>>> throughput logger that should produce the right output is this one [2].
>>>> Very recently, I've opened a pull request for adding metric-measuring
>>>> support into the engine [3]. Maybe that's helpful for your experiments.
>>>>
>>>>
>>>> [1] http://data-artisans.com/high-throughput-low-latency-and
>>>> -exactly-once-stream-processing-with-apache-flink/
>>>> [2] https://github.com/dataArtisans/performance/blob/master/
>>>> flink-jobs/src/main/java/com/github/projectflink/streaming/T
>>>> hroughput.java#L203
>>>> [3] https://github.com/apache/flink/pull/2386
>>>>
>>>>
>>>>
>>>> On Wed, Aug 24, 2016 at 2:04 PM, Maximilian Michels <mx...@apache.org>
>>>> wrote:
>>>>
>>>>> I believe the AnaylzeTool is for processing logs of a different
>>>>> benchmark.
>>>>>
>>>>> CC Jamie and Robert who worked on the benchmark.
>>>>>
>>>>> On Wed, Aug 24, 2016 at 3:25 AM, Eric Fukuda <e....@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I'm trying to benchmark Flink without Kafka as mentioned in this post
>>>>> > (http://data-artisans.com/extending-the-yahoo-streaming-benchmark/).
>>>>> After
>>>>> > running flink.benchmark.state.AdvertisingTopologyFlinkState with
>>>>> > user.local.event.generator in localConf.yaml set to 1, I ran
>>>>> > flink.benchmark.utils.AnalyzeTool giving
>>>>> > flink-1.0.1/log/flink-[username]-jobmanager-0-[servername].log as a
>>>>> > command-line argument. I got the following output and it does not
>>>>> have the
>>>>> > information about the latency.
>>>>> >
>>>>> >
>>>>> > ================= Latency (0 reports ) =====================
>>>>> > ================= Throughput (1 reports ) =====================
>>>>> > ====== null (entries: 10150)=======
>>>>> > Mean throughput 639078.5018497099
>>>>> > Exception in thread "main" java.lang.IndexOutOfBoundsException:
>>>>> toIndex = 2
>>>>> >         at java.util.ArrayList.subListRangeCheck(ArrayList.java:962)
>>>>> >         at java.util.ArrayList.subList(ArrayList.java:954)
>>>>> >         at flink.benchmark.utils.AnalyzeT
>>>>> ool.main(AnalyzeTool.java:133)
>>>>> >
>>>>> >
>>>>> > Reading the code in AnalyzeTool.java, I found that it's looking for
>>>>> lines
>>>>> > that include "Latency" in the log file, but apparently it's not
>>>>> finding any.
>>>>> > I tried grepping the log file, and couldn't find any either. I have
>>>>> one
>>>>> > server that runs both JobManager and Task Manager and another server
>>>>> that
>>>>> > runs Redis, and they are connected through a network with each other.
>>>>> >
>>>>> > I think I have to do something to read the data stored in Redis
>>>>> before
>>>>> > running AnalyzeTool, but can't figure out what. Does anyone know how
>>>>> to get
>>>>> > the latency information?
>>>>> >
>>>>> > Thanks,
>>>>> > Eric
>>>>>
>>>>
>>>>
>>>
>>
>