You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Mauro Giusti <ma...@microsoft.com> on 2017/05/12 22:56:24 UTC

Performance of Multi-Lang protocol

Hi -
We are using multi-lang to pass data between storm and mono -

We observe a 6x time increase when messages go from spout to bolt if the bolt is in mono vs. being in Java -

Java can process 10,000 records in 0.7 seconds, while mono requires 4.5 seconds.
The mono bolt was an empty one created with Storm.Net.Adapter<https://github.com/ziyunhx/storm-net-adapter> library

This is on a single machine topology - we are still in dev phase and using this solution for now -

Is this expected?
Should we try to minimize multi-lang and inter-process or is this a problem with my specific scenario (mono and/or single machine) ?

Thank you -
Mauro.

Re: Performance of Multi-Lang protocol

Posted by Zhechao Ma <ma...@gmail.com>.
HeartSaVioR,

It's here https://www.mail-archive.com/user@storm.apache.org/msg04942.html

2017-05-15 12:14 GMT+08:00 Jungtaek Lim <ka...@gmail.com>:

> Zhechao,
>
> Could you please link the mail regarding python shell bolt performance
> issue if you can find it from archive?
>
> Thanks in advance!
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 5월 15일 (월) 오후 12:37, Zhechao Ma <ma...@gmail.com>님이 작성:
>
>> I started to use storm with python since storm 0.9.2, and I'm concerned
>> about multi-lang performance improvement.
>>
>> There is a pull request (https://github.com/apache/storm/pull/1136) for
>> multi-lang perfromance improvements opened a year ago, but has not been
>> merged yet. It uses MessagePackSerializer to repalce the default JSON
>> Serializer.
>>
>> Also, there was  a mail mentioning python shell bolt performance issue on
>> 2016/1/3. A benchmark result of Msgpack was given out in that mail.
>>
>> I agree with @HeartSaVioR to do python optimization first.
>>
>>
>> 2017-05-13 13:23 GMT+08:00 Jungtaek Lim <ka...@gmail.com>:
>>
>>> I'd like to see other multi-lang users' voice as well.
>>>
>>> I guess many users are using Streamparse, so the users of Streamparse
>>> may be able report how much the performance difference is. If Streamparse
>>> uses non-default serde to reduce the performance hit, Storm could even use
>>> it to the default serde, but that requires breaking backward compatibility.
>>>
>>> Btw, IMHO, it might be considerable to focus less languages for
>>> optimization, like supporting only Python (as data scientists are familiar
>>> with it) as second language and trying to apply python-specific
>>> optimization. We also may need to support non-Java language for new Streams
>>> API, and it might not easy to support it with current multi-lang approach.
>>> PySpark-like approach would be reasonable.
>>>
>>> We could still support multi-lang, but without outstanding improvement.
>>>
>>> Would like to hear opinions on my proposal, too.
>>>
>>> - Jungtaek Lim (HeartSaVioR)
>>>
>>> 2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <ma...@microsoft.com>님이 작성:
>>>
>>>> *My PC:*
>>>>
>>>> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I
>>>> only have 8 GB of memory occupied.
>>>>
>>>> I increased the memory of the Java VM to 4 GB and it only uses 1 GB
>>>> when the test runs.
>>>>
>>>>
>>>>
>>>> *The Topology:*
>>>>
>>>> On my PC, I have three Spouts in mono, and one Bolt in mono.
>>>>
>>>> The topology is described in Flux – so I have basically zero code in
>>>> Java, all in Flux .yaml + .Net with mono.
>>>>
>>>> All the messages use SHUFFLE and there is one worker only (my PC)
>>>>
>>>>
>>>>
>>>> I run in local mode and I also have a Docker container where I deployed
>>>> this.
>>>>
>>>>
>>>>
>>>> *Topology details:*
>>>>
>>>> The Spouts read from an internal service, I collect about 60/70,000
>>>> records each minute.
>>>>
>>>>
>>>>
>>>> The Bolt reads from the three Spouts and makes aggregation in memory
>>>> using SqlLite, the records are added to SqlLite as they arrive, then every
>>>> 30 seconds SqlLite runs an aggregation and emits the data to an instance of
>>>> Redis cache (via another Bolt hop).
>>>>
>>>>
>>>>
>>>> To test with Java, I replaced the Bolt with a simple Java Bolt that was
>>>> only logging every 10,000 records.
>>>>
>>>> To compare with Mono, I created an empty .net Bolt and did the same.
>>>>
>>>>
>>>>
>>>> *My Tests:*
>>>>
>>>> The Flux topology is attached.
>>>>
>>>> The Java class I used to test and the .Net Bolt are as well.
>>>>
>>>> Again, the Spouts are .Net classes that emits 65K rows per minute.
>>>>
>>>>
>>>>
>>>> The log files are attached, you can see how much time it takes for the
>>>> Bolt to consume 10,000 records –
>>>>
>>>> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
>>>> records takes around 4.5 seconds.
>>>>
>>>> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each
>>>> 10,000 records takes around 0.7 seconds.
>>>>
>>>> The Linux.txt is on the Docker container (still on my PC but using
>>>> Docker for Windows in Linux Dockers mode), using mono but on Linux this
>>>> time - the results are compatible with Mono on Windows (4.5 seconds per
>>>> 10.000 records).
>>>>
>>>> I also tried calling directly the Windows exe on Windows in local mode,
>>>> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
>>>> (NetExe.txt)
>>>>
>>>>
>>>>
>>>> *Results:*
>>>>
>>>> I know I can scale out and partition the data, but the amount of
>>>> processing did not seem to require that –
>>>>
>>>>
>>>>
>>>> Maybe one issue is that the object I am moving has 11 fields?
>>>>
>>>>
>>>>
>>>> I can try to create a mini-repro if the dev team is interested –
>>>> hopefully this might find what the bottleneck is -
>>>>
>>>>
>>>>
>>>> Thanks for your attention -
>>>>
>>>> Mauro.
>>>>
>>>>
>>>>
>>>> *From:* P. Taylor Goetz [mailto:ptgoetz@gmail.com]
>>>> *Sent:* Friday, May 12, 2017 4:55 PM
>>>> *To:* user@storm.apache.org; dev@storm.apache.org
>>>> *Subject:* Re: Performance of Multi-Lang protocol
>>>>
>>>>
>>>>
>>>> Adding dev@ mailing list...
>>>>
>>>>
>>>>
>>>> There is definitely a performance hit. But it shouldn't be as drastic
>>>> as you describe.
>>>>
>>>>
>>>>
>>>> Can you share some of your environment characteristics?
>>>>
>>>>
>>>>
>>>> I've been looking at the Apache Arrow project (full disclosure: I'm a
>>>> PMC member) as a means for improved performance (it essentially would
>>>> remove the performance hit for serialize/deserialize operations). This is
>>>> particularly relevant to multi-lang, but could also apply to same-machine
>>>> inter-worker communication.
>>>>
>>>>
>>>>
>>>> At this point I don't feel Arrow is at a production level maturity, but
>>>> is getting close. I definitely feel it's worth exploring at PoC level.
>>>>
>>>>
>>>>
>>>> -Taylor
>>>>
>>>>
>>>> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
>>>>
>>>> Hi –
>>>>
>>>> We are using multi-lang to pass data between storm and mono –
>>>>
>>>>
>>>>
>>>> We observe a 6x time increase when messages go from spout to bolt if
>>>> the bolt is in mono vs. being in Java –
>>>>
>>>>
>>>>
>>>> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
>>>> seconds.
>>>>
>>>> The mono bolt was an empty one created with Storm.Net.Adapter
>>>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
>>>> library
>>>>
>>>>
>>>>
>>>> This is on a single machine topology – we are still in dev phase and
>>>> using this solution for now -
>>>>
>>>>
>>>>
>>>> Is this expected?
>>>>
>>>> Should we try to minimize multi-lang and inter-process or is this a
>>>> problem with my specific scenario (mono and/or single machine) ?
>>>>
>>>>
>>>>
>>>> Thank you –
>>>>
>>>> Mauro.
>>>>
>>>>
>>
>>
>> --
>> Thanks
>> Zhechao Ma
>>
>


-- 
Thanks
Zhechao Ma

Re: Performance of Multi-Lang protocol

Posted by Zhechao Ma <ma...@gmail.com>.
HeartSaVioR,

It's here https://www.mail-archive.com/user@storm.apache.org/msg04942.html

2017-05-15 12:14 GMT+08:00 Jungtaek Lim <ka...@gmail.com>:

> Zhechao,
>
> Could you please link the mail regarding python shell bolt performance
> issue if you can find it from archive?
>
> Thanks in advance!
> Jungtaek Lim (HeartSaVioR)
>
> 2017년 5월 15일 (월) 오후 12:37, Zhechao Ma <ma...@gmail.com>님이 작성:
>
>> I started to use storm with python since storm 0.9.2, and I'm concerned
>> about multi-lang performance improvement.
>>
>> There is a pull request (https://github.com/apache/storm/pull/1136) for
>> multi-lang perfromance improvements opened a year ago, but has not been
>> merged yet. It uses MessagePackSerializer to repalce the default JSON
>> Serializer.
>>
>> Also, there was  a mail mentioning python shell bolt performance issue on
>> 2016/1/3. A benchmark result of Msgpack was given out in that mail.
>>
>> I agree with @HeartSaVioR to do python optimization first.
>>
>>
>> 2017-05-13 13:23 GMT+08:00 Jungtaek Lim <ka...@gmail.com>:
>>
>>> I'd like to see other multi-lang users' voice as well.
>>>
>>> I guess many users are using Streamparse, so the users of Streamparse
>>> may be able report how much the performance difference is. If Streamparse
>>> uses non-default serde to reduce the performance hit, Storm could even use
>>> it to the default serde, but that requires breaking backward compatibility.
>>>
>>> Btw, IMHO, it might be considerable to focus less languages for
>>> optimization, like supporting only Python (as data scientists are familiar
>>> with it) as second language and trying to apply python-specific
>>> optimization. We also may need to support non-Java language for new Streams
>>> API, and it might not easy to support it with current multi-lang approach.
>>> PySpark-like approach would be reasonable.
>>>
>>> We could still support multi-lang, but without outstanding improvement.
>>>
>>> Would like to hear opinions on my proposal, too.
>>>
>>> - Jungtaek Lim (HeartSaVioR)
>>>
>>> 2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <ma...@microsoft.com>님이 작성:
>>>
>>>> *My PC:*
>>>>
>>>> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I
>>>> only have 8 GB of memory occupied.
>>>>
>>>> I increased the memory of the Java VM to 4 GB and it only uses 1 GB
>>>> when the test runs.
>>>>
>>>>
>>>>
>>>> *The Topology:*
>>>>
>>>> On my PC, I have three Spouts in mono, and one Bolt in mono.
>>>>
>>>> The topology is described in Flux – so I have basically zero code in
>>>> Java, all in Flux .yaml + .Net with mono.
>>>>
>>>> All the messages use SHUFFLE and there is one worker only (my PC)
>>>>
>>>>
>>>>
>>>> I run in local mode and I also have a Docker container where I deployed
>>>> this.
>>>>
>>>>
>>>>
>>>> *Topology details:*
>>>>
>>>> The Spouts read from an internal service, I collect about 60/70,000
>>>> records each minute.
>>>>
>>>>
>>>>
>>>> The Bolt reads from the three Spouts and makes aggregation in memory
>>>> using SqlLite, the records are added to SqlLite as they arrive, then every
>>>> 30 seconds SqlLite runs an aggregation and emits the data to an instance of
>>>> Redis cache (via another Bolt hop).
>>>>
>>>>
>>>>
>>>> To test with Java, I replaced the Bolt with a simple Java Bolt that was
>>>> only logging every 10,000 records.
>>>>
>>>> To compare with Mono, I created an empty .net Bolt and did the same.
>>>>
>>>>
>>>>
>>>> *My Tests:*
>>>>
>>>> The Flux topology is attached.
>>>>
>>>> The Java class I used to test and the .Net Bolt are as well.
>>>>
>>>> Again, the Spouts are .Net classes that emits 65K rows per minute.
>>>>
>>>>
>>>>
>>>> The log files are attached, you can see how much time it takes for the
>>>> Bolt to consume 10,000 records –
>>>>
>>>> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
>>>> records takes around 4.5 seconds.
>>>>
>>>> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each
>>>> 10,000 records takes around 0.7 seconds.
>>>>
>>>> The Linux.txt is on the Docker container (still on my PC but using
>>>> Docker for Windows in Linux Dockers mode), using mono but on Linux this
>>>> time - the results are compatible with Mono on Windows (4.5 seconds per
>>>> 10.000 records).
>>>>
>>>> I also tried calling directly the Windows exe on Windows in local mode,
>>>> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
>>>> (NetExe.txt)
>>>>
>>>>
>>>>
>>>> *Results:*
>>>>
>>>> I know I can scale out and partition the data, but the amount of
>>>> processing did not seem to require that –
>>>>
>>>>
>>>>
>>>> Maybe one issue is that the object I am moving has 11 fields?
>>>>
>>>>
>>>>
>>>> I can try to create a mini-repro if the dev team is interested –
>>>> hopefully this might find what the bottleneck is -
>>>>
>>>>
>>>>
>>>> Thanks for your attention -
>>>>
>>>> Mauro.
>>>>
>>>>
>>>>
>>>> *From:* P. Taylor Goetz [mailto:ptgoetz@gmail.com]
>>>> *Sent:* Friday, May 12, 2017 4:55 PM
>>>> *To:* user@storm.apache.org; dev@storm.apache.org
>>>> *Subject:* Re: Performance of Multi-Lang protocol
>>>>
>>>>
>>>>
>>>> Adding dev@ mailing list...
>>>>
>>>>
>>>>
>>>> There is definitely a performance hit. But it shouldn't be as drastic
>>>> as you describe.
>>>>
>>>>
>>>>
>>>> Can you share some of your environment characteristics?
>>>>
>>>>
>>>>
>>>> I've been looking at the Apache Arrow project (full disclosure: I'm a
>>>> PMC member) as a means for improved performance (it essentially would
>>>> remove the performance hit for serialize/deserialize operations). This is
>>>> particularly relevant to multi-lang, but could also apply to same-machine
>>>> inter-worker communication.
>>>>
>>>>
>>>>
>>>> At this point I don't feel Arrow is at a production level maturity, but
>>>> is getting close. I definitely feel it's worth exploring at PoC level.
>>>>
>>>>
>>>>
>>>> -Taylor
>>>>
>>>>
>>>> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
>>>>
>>>> Hi –
>>>>
>>>> We are using multi-lang to pass data between storm and mono –
>>>>
>>>>
>>>>
>>>> We observe a 6x time increase when messages go from spout to bolt if
>>>> the bolt is in mono vs. being in Java –
>>>>
>>>>
>>>>
>>>> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
>>>> seconds.
>>>>
>>>> The mono bolt was an empty one created with Storm.Net.Adapter
>>>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
>>>> library
>>>>
>>>>
>>>>
>>>> This is on a single machine topology – we are still in dev phase and
>>>> using this solution for now -
>>>>
>>>>
>>>>
>>>> Is this expected?
>>>>
>>>> Should we try to minimize multi-lang and inter-process or is this a
>>>> problem with my specific scenario (mono and/or single machine) ?
>>>>
>>>>
>>>>
>>>> Thank you –
>>>>
>>>> Mauro.
>>>>
>>>>
>>
>>
>> --
>> Thanks
>> Zhechao Ma
>>
>


-- 
Thanks
Zhechao Ma

Re: Performance of Multi-Lang protocol

Posted by Jungtaek Lim <ka...@gmail.com>.
Zhechao,

Could you please link the mail regarding python shell bolt performance
issue if you can find it from archive?

Thanks in advance!
Jungtaek Lim (HeartSaVioR)

2017년 5월 15일 (월) 오후 12:37, Zhechao Ma <ma...@gmail.com>님이 작성:

> I started to use storm with python since storm 0.9.2, and I'm concerned
> about multi-lang performance improvement.
>
> There is a pull request (https://github.com/apache/storm/pull/1136) for
> multi-lang perfromance improvements opened a year ago, but has not been
> merged yet. It uses MessagePackSerializer to repalce the default JSON
> Serializer.
>
> Also, there was  a mail mentioning python shell bolt performance issue on
> 2016/1/3. A benchmark result of Msgpack was given out in that mail.
>
> I agree with @HeartSaVioR to do python optimization first.
>
>
> 2017-05-13 13:23 GMT+08:00 Jungtaek Lim <ka...@gmail.com>:
>
>> I'd like to see other multi-lang users' voice as well.
>>
>> I guess many users are using Streamparse, so the users of Streamparse may
>> be able report how much the performance difference is. If Streamparse uses
>> non-default serde to reduce the performance hit, Storm could even use it to
>> the default serde, but that requires breaking backward compatibility.
>>
>> Btw, IMHO, it might be considerable to focus less languages for
>> optimization, like supporting only Python (as data scientists are familiar
>> with it) as second language and trying to apply python-specific
>> optimization. We also may need to support non-Java language for new Streams
>> API, and it might not easy to support it with current multi-lang approach.
>> PySpark-like approach would be reasonable.
>>
>> We could still support multi-lang, but without outstanding improvement.
>>
>> Would like to hear opinions on my proposal, too.
>>
>> - Jungtaek Lim (HeartSaVioR)
>>
>> 2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <ma...@microsoft.com>님이 작성:
>>
>>> *My PC:*
>>>
>>> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I
>>> only have 8 GB of memory occupied.
>>>
>>> I increased the memory of the Java VM to 4 GB and it only uses 1 GB when
>>> the test runs.
>>>
>>>
>>>
>>> *The Topology:*
>>>
>>> On my PC, I have three Spouts in mono, and one Bolt in mono.
>>>
>>> The topology is described in Flux – so I have basically zero code in
>>> Java, all in Flux .yaml + .Net with mono.
>>>
>>> All the messages use SHUFFLE and there is one worker only (my PC)
>>>
>>>
>>>
>>> I run in local mode and I also have a Docker container where I deployed
>>> this.
>>>
>>>
>>>
>>> *Topology details:*
>>>
>>> The Spouts read from an internal service, I collect about 60/70,000
>>> records each minute.
>>>
>>>
>>>
>>> The Bolt reads from the three Spouts and makes aggregation in memory
>>> using SqlLite, the records are added to SqlLite as they arrive, then every
>>> 30 seconds SqlLite runs an aggregation and emits the data to an instance of
>>> Redis cache (via another Bolt hop).
>>>
>>>
>>>
>>> To test with Java, I replaced the Bolt with a simple Java Bolt that was
>>> only logging every 10,000 records.
>>>
>>> To compare with Mono, I created an empty .net Bolt and did the same.
>>>
>>>
>>>
>>> *My Tests:*
>>>
>>> The Flux topology is attached.
>>>
>>> The Java class I used to test and the .Net Bolt are as well.
>>>
>>> Again, the Spouts are .Net classes that emits 65K rows per minute.
>>>
>>>
>>>
>>> The log files are attached, you can see how much time it takes for the
>>> Bolt to consume 10,000 records –
>>>
>>> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
>>> records takes around 4.5 seconds.
>>>
>>> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each
>>> 10,000 records takes around 0.7 seconds.
>>>
>>> The Linux.txt is on the Docker container (still on my PC but using
>>> Docker for Windows in Linux Dockers mode), using mono but on Linux this
>>> time - the results are compatible with Mono on Windows (4.5 seconds per
>>> 10.000 records).
>>>
>>> I also tried calling directly the Windows exe on Windows in local mode,
>>> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
>>> (NetExe.txt)
>>>
>>>
>>>
>>> *Results:*
>>>
>>> I know I can scale out and partition the data, but the amount of
>>> processing did not seem to require that –
>>>
>>>
>>>
>>> Maybe one issue is that the object I am moving has 11 fields?
>>>
>>>
>>>
>>> I can try to create a mini-repro if the dev team is interested –
>>> hopefully this might find what the bottleneck is -
>>>
>>>
>>>
>>> Thanks for your attention -
>>>
>>> Mauro.
>>>
>>>
>>>
>>> *From:* P. Taylor Goetz [mailto:ptgoetz@gmail.com]
>>> *Sent:* Friday, May 12, 2017 4:55 PM
>>> *To:* user@storm.apache.org; dev@storm.apache.org
>>> *Subject:* Re: Performance of Multi-Lang protocol
>>>
>>>
>>>
>>> Adding dev@ mailing list...
>>>
>>>
>>>
>>> There is definitely a performance hit. But it shouldn't be as drastic as
>>> you describe.
>>>
>>>
>>>
>>> Can you share some of your environment characteristics?
>>>
>>>
>>>
>>> I've been looking at the Apache Arrow project (full disclosure: I'm a
>>> PMC member) as a means for improved performance (it essentially would
>>> remove the performance hit for serialize/deserialize operations). This is
>>> particularly relevant to multi-lang, but could also apply to same-machine
>>> inter-worker communication.
>>>
>>>
>>>
>>> At this point I don't feel Arrow is at a production level maturity, but
>>> is getting close. I definitely feel it's worth exploring at PoC level.
>>>
>>>
>>>
>>> -Taylor
>>>
>>>
>>> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
>>>
>>> Hi –
>>>
>>> We are using multi-lang to pass data between storm and mono –
>>>
>>>
>>>
>>> We observe a 6x time increase when messages go from spout to bolt if the
>>> bolt is in mono vs. being in Java –
>>>
>>>
>>>
>>> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
>>> seconds.
>>>
>>> The mono bolt was an empty one created with Storm.Net.Adapter
>>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
>>> library
>>>
>>>
>>>
>>> This is on a single machine topology – we are still in dev phase and
>>> using this solution for now -
>>>
>>>
>>>
>>> Is this expected?
>>>
>>> Should we try to minimize multi-lang and inter-process or is this a
>>> problem with my specific scenario (mono and/or single machine) ?
>>>
>>>
>>>
>>> Thank you –
>>>
>>> Mauro.
>>>
>>>
>
>
> --
> Thanks
> Zhechao Ma
>

Re: Performance of Multi-Lang protocol

Posted by Zhechao Ma <ma...@gmail.com>.
I started to use storm with python since storm 0.9.2, and I'm concerned
about multi-lang performance improvement.

There is a pull request (https://github.com/apache/storm/pull/1136) for
multi-lang perfromance improvements opened a year ago, but has not been
merged yet. It uses MessagePackSerializer to repalce the default JSON
Serializer.

Also, there was  a mail mentioning python shell bolt performance issue on
2016/1/3. A benchmark result of Msgpack was given out in that mail.

I agree with @HeartSaVioR to do python optimization first.


2017-05-13 13:23 GMT+08:00 Jungtaek Lim <ka...@gmail.com>:

> I'd like to see other multi-lang users' voice as well.
>
> I guess many users are using Streamparse, so the users of Streamparse may
> be able report how much the performance difference is. If Streamparse uses
> non-default serde to reduce the performance hit, Storm could even use it to
> the default serde, but that requires breaking backward compatibility.
>
> Btw, IMHO, it might be considerable to focus less languages for
> optimization, like supporting only Python (as data scientists are familiar
> with it) as second language and trying to apply python-specific
> optimization. We also may need to support non-Java language for new Streams
> API, and it might not easy to support it with current multi-lang approach.
> PySpark-like approach would be reasonable.
>
> We could still support multi-lang, but without outstanding improvement.
>
> Would like to hear opinions on my proposal, too.
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <ma...@microsoft.com>님이 작성:
>
>> *My PC:*
>>
>> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I only
>> have 8 GB of memory occupied.
>>
>> I increased the memory of the Java VM to 4 GB and it only uses 1 GB when
>> the test runs.
>>
>>
>>
>> *The Topology:*
>>
>> On my PC, I have three Spouts in mono, and one Bolt in mono.
>>
>> The topology is described in Flux – so I have basically zero code in
>> Java, all in Flux .yaml + .Net with mono.
>>
>> All the messages use SHUFFLE and there is one worker only (my PC)
>>
>>
>>
>> I run in local mode and I also have a Docker container where I deployed
>> this.
>>
>>
>>
>> *Topology details:*
>>
>> The Spouts read from an internal service, I collect about 60/70,000
>> records each minute.
>>
>>
>>
>> The Bolt reads from the three Spouts and makes aggregation in memory
>> using SqlLite, the records are added to SqlLite as they arrive, then every
>> 30 seconds SqlLite runs an aggregation and emits the data to an instance of
>> Redis cache (via another Bolt hop).
>>
>>
>>
>> To test with Java, I replaced the Bolt with a simple Java Bolt that was
>> only logging every 10,000 records.
>>
>> To compare with Mono, I created an empty .net Bolt and did the same.
>>
>>
>>
>> *My Tests:*
>>
>> The Flux topology is attached.
>>
>> The Java class I used to test and the .Net Bolt are as well.
>>
>> Again, the Spouts are .Net classes that emits 65K rows per minute.
>>
>>
>>
>> The log files are attached, you can see how much time it takes for the
>> Bolt to consume 10,000 records –
>>
>> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
>> records takes around 4.5 seconds.
>>
>> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each 10,000
>> records takes around 0.7 seconds.
>>
>> The Linux.txt is on the Docker container (still on my PC but using Docker
>> for Windows in Linux Dockers mode), using mono but on Linux this time - the
>> results are compatible with Mono on Windows (4.5 seconds per 10.000
>> records).
>>
>> I also tried calling directly the Windows exe on Windows in local mode,
>> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
>> (NetExe.txt)
>>
>>
>>
>> *Results:*
>>
>> I know I can scale out and partition the data, but the amount of
>> processing did not seem to require that –
>>
>>
>>
>> Maybe one issue is that the object I am moving has 11 fields?
>>
>>
>>
>> I can try to create a mini-repro if the dev team is interested –
>> hopefully this might find what the bottleneck is -
>>
>>
>>
>> Thanks for your attention -
>>
>> Mauro.
>>
>>
>>
>> *From:* P. Taylor Goetz [mailto:ptgoetz@gmail.com]
>> *Sent:* Friday, May 12, 2017 4:55 PM
>> *To:* user@storm.apache.org; dev@storm.apache.org
>> *Subject:* Re: Performance of Multi-Lang protocol
>>
>>
>>
>> Adding dev@ mailing list...
>>
>>
>>
>> There is definitely a performance hit. But it shouldn't be as drastic as
>> you describe.
>>
>>
>>
>> Can you share some of your environment characteristics?
>>
>>
>>
>> I've been looking at the Apache Arrow project (full disclosure: I'm a PMC
>> member) as a means for improved performance (it essentially would remove
>> the performance hit for serialize/deserialize operations). This is
>> particularly relevant to multi-lang, but could also apply to same-machine
>> inter-worker communication.
>>
>>
>>
>> At this point I don't feel Arrow is at a production level maturity, but
>> is getting close. I definitely feel it's worth exploring at PoC level.
>>
>>
>>
>> -Taylor
>>
>>
>> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
>>
>> Hi –
>>
>> We are using multi-lang to pass data between storm and mono –
>>
>>
>>
>> We observe a 6x time increase when messages go from spout to bolt if the
>> bolt is in mono vs. being in Java –
>>
>>
>>
>> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
>> seconds.
>>
>> The mono bolt was an empty one created with Storm.Net.Adapter
>> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
>> library
>>
>>
>>
>> This is on a single machine topology – we are still in dev phase and
>> using this solution for now -
>>
>>
>>
>> Is this expected?
>>
>> Should we try to minimize multi-lang and inter-process or is this a
>> problem with my specific scenario (mono and/or single machine) ?
>>
>>
>>
>> Thank you –
>>
>> Mauro.
>>
>>


-- 
Thanks
Zhechao Ma

Re: Performance of Multi-Lang protocol

Posted by Jungtaek Lim <ka...@gmail.com>.
I'd like to see other multi-lang users' voice as well.

I guess many users are using Streamparse, so the users of Streamparse may
be able report how much the performance difference is. If Streamparse uses
non-default serde to reduce the performance hit, Storm could even use it to
the default serde, but that requires breaking backward compatibility.

Btw, IMHO, it might be considerable to focus less languages for
optimization, like supporting only Python (as data scientists are familiar
with it) as second language and trying to apply python-specific
optimization. We also may need to support non-Java language for new Streams
API, and it might not easy to support it with current multi-lang approach.
PySpark-like approach would be reasonable.

We could still support multi-lang, but without outstanding improvement.

Would like to hear opinions on my proposal, too.

- Jungtaek Lim (HeartSaVioR)

2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <ma...@microsoft.com>님이 작성:

> *My PC:*
>
> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I only
> have 8 GB of memory occupied.
>
> I increased the memory of the Java VM to 4 GB and it only uses 1 GB when
> the test runs.
>
>
>
> *The Topology:*
>
> On my PC, I have three Spouts in mono, and one Bolt in mono.
>
> The topology is described in Flux – so I have basically zero code in Java,
> all in Flux .yaml + .Net with mono.
>
> All the messages use SHUFFLE and there is one worker only (my PC)
>
>
>
> I run in local mode and I also have a Docker container where I deployed
> this.
>
>
>
> *Topology details:*
>
> The Spouts read from an internal service, I collect about 60/70,000
> records each minute.
>
>
>
> The Bolt reads from the three Spouts and makes aggregation in memory using
> SqlLite, the records are added to SqlLite as they arrive, then every 30
> seconds SqlLite runs an aggregation and emits the data to an instance of
> Redis cache (via another Bolt hop).
>
>
>
> To test with Java, I replaced the Bolt with a simple Java Bolt that was
> only logging every 10,000 records.
>
> To compare with Mono, I created an empty .net Bolt and did the same.
>
>
>
> *My Tests:*
>
> The Flux topology is attached.
>
> The Java class I used to test and the .Net Bolt are as well.
>
> Again, the Spouts are .Net classes that emits 65K rows per minute.
>
>
>
> The log files are attached, you can see how much time it takes for the
> Bolt to consume 10,000 records –
>
> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
> records takes around 4.5 seconds.
>
> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each 10,000
> records takes around 0.7 seconds.
>
> The Linux.txt is on the Docker container (still on my PC but using Docker
> for Windows in Linux Dockers mode), using mono but on Linux this time - the
> results are compatible with Mono on Windows (4.5 seconds per 10.000
> records).
>
> I also tried calling directly the Windows exe on Windows in local mode,
> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
> (NetExe.txt)
>
>
>
> *Results:*
>
> I know I can scale out and partition the data, but the amount of
> processing did not seem to require that –
>
>
>
> Maybe one issue is that the object I am moving has 11 fields?
>
>
>
> I can try to create a mini-repro if the dev team is interested – hopefully
> this might find what the bottleneck is -
>
>
>
> Thanks for your attention -
>
> Mauro.
>
>
>
> *From:* P. Taylor Goetz [mailto:ptgoetz@gmail.com]
> *Sent:* Friday, May 12, 2017 4:55 PM
> *To:* user@storm.apache.org; dev@storm.apache.org
> *Subject:* Re: Performance of Multi-Lang protocol
>
>
>
> Adding dev@ mailing list...
>
>
>
> There is definitely a performance hit. But it shouldn't be as drastic as
> you describe.
>
>
>
> Can you share some of your environment characteristics?
>
>
>
> I've been looking at the Apache Arrow project (full disclosure: I'm a PMC
> member) as a means for improved performance (it essentially would remove
> the performance hit for serialize/deserialize operations). This is
> particularly relevant to multi-lang, but could also apply to same-machine
> inter-worker communication.
>
>
>
> At this point I don't feel Arrow is at a production level maturity, but is
> getting close. I definitely feel it's worth exploring at PoC level.
>
>
>
> -Taylor
>
>
> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
>
> Hi –
>
> We are using multi-lang to pass data between storm and mono –
>
>
>
> We observe a 6x time increase when messages go from spout to bolt if the
> bolt is in mono vs. being in Java –
>
>
>
> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
> seconds.
>
> The mono bolt was an empty one created with Storm.Net.Adapter
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
> library
>
>
>
> This is on a single machine topology – we are still in dev phase and using
> this solution for now -
>
>
>
> Is this expected?
>
> Should we try to minimize multi-lang and inter-process or is this a
> problem with my specific scenario (mono and/or single machine) ?
>
>
>
> Thank you –
>
> Mauro.
>
>

Re: Performance of Multi-Lang protocol

Posted by Jungtaek Lim <ka...@gmail.com>.
I'd like to see other multi-lang users' voice as well.

I guess many users are using Streamparse, so the users of Streamparse may
be able report how much the performance difference is. If Streamparse uses
non-default serde to reduce the performance hit, Storm could even use it to
the default serde, but that requires breaking backward compatibility.

Btw, IMHO, it might be considerable to focus less languages for
optimization, like supporting only Python (as data scientists are familiar
with it) as second language and trying to apply python-specific
optimization. We also may need to support non-Java language for new Streams
API, and it might not easy to support it with current multi-lang approach.
PySpark-like approach would be reasonable.

We could still support multi-lang, but without outstanding improvement.

Would like to hear opinions on my proposal, too.

- Jungtaek Lim (HeartSaVioR)

2017년 5월 13일 (토) 오전 9:46, Mauro Giusti <ma...@microsoft.com>님이 작성:

> *My PC:*
>
> My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I only
> have 8 GB of memory occupied.
>
> I increased the memory of the Java VM to 4 GB and it only uses 1 GB when
> the test runs.
>
>
>
> *The Topology:*
>
> On my PC, I have three Spouts in mono, and one Bolt in mono.
>
> The topology is described in Flux – so I have basically zero code in Java,
> all in Flux .yaml + .Net with mono.
>
> All the messages use SHUFFLE and there is one worker only (my PC)
>
>
>
> I run in local mode and I also have a Docker container where I deployed
> this.
>
>
>
> *Topology details:*
>
> The Spouts read from an internal service, I collect about 60/70,000
> records each minute.
>
>
>
> The Bolt reads from the three Spouts and makes aggregation in memory using
> SqlLite, the records are added to SqlLite as they arrive, then every 30
> seconds SqlLite runs an aggregation and emits the data to an instance of
> Redis cache (via another Bolt hop).
>
>
>
> To test with Java, I replaced the Bolt with a simple Java Bolt that was
> only logging every 10,000 records.
>
> To compare with Mono, I created an empty .net Bolt and did the same.
>
>
>
> *My Tests:*
>
> The Flux topology is attached.
>
> The Java class I used to test and the .Net Bolt are as well.
>
> Again, the Spouts are .Net classes that emits 65K rows per minute.
>
>
>
> The log files are attached, you can see how much time it takes for the
> Bolt to consume 10,000 records –
>
> Inter-Language.txt is on my PC using the mono debug bolt, each 10,000
> records takes around 4.5 seconds.
>
> The Java.txt is on my PC using Java (TransformEchoBolt.Java), each 10,000
> records takes around 0.7 seconds.
>
> The Linux.txt is on the Docker container (still on my PC but using Docker
> for Windows in Linux Dockers mode), using mono but on Linux this time - the
> results are compatible with Mono on Windows (4.5 seconds per 10.000
> records).
>
> I also tried calling directly the Windows exe on Windows in local mode,
> bypassing mono – the results were not pretty: 15 seconds per 10,000 records
> (NetExe.txt)
>
>
>
> *Results:*
>
> I know I can scale out and partition the data, but the amount of
> processing did not seem to require that –
>
>
>
> Maybe one issue is that the object I am moving has 11 fields?
>
>
>
> I can try to create a mini-repro if the dev team is interested – hopefully
> this might find what the bottleneck is -
>
>
>
> Thanks for your attention -
>
> Mauro.
>
>
>
> *From:* P. Taylor Goetz [mailto:ptgoetz@gmail.com]
> *Sent:* Friday, May 12, 2017 4:55 PM
> *To:* user@storm.apache.org; dev@storm.apache.org
> *Subject:* Re: Performance of Multi-Lang protocol
>
>
>
> Adding dev@ mailing list...
>
>
>
> There is definitely a performance hit. But it shouldn't be as drastic as
> you describe.
>
>
>
> Can you share some of your environment characteristics?
>
>
>
> I've been looking at the Apache Arrow project (full disclosure: I'm a PMC
> member) as a means for improved performance (it essentially would remove
> the performance hit for serialize/deserialize operations). This is
> particularly relevant to multi-lang, but could also apply to same-machine
> inter-worker communication.
>
>
>
> At this point I don't feel Arrow is at a production level maturity, but is
> getting close. I definitely feel it's worth exploring at PoC level.
>
>
>
> -Taylor
>
>
> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
>
> Hi –
>
> We are using multi-lang to pass data between storm and mono –
>
>
>
> We observe a 6x time increase when messages go from spout to bolt if the
> bolt is in mono vs. being in Java –
>
>
>
> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5
> seconds.
>
> The mono bolt was an empty one created with Storm.Net.Adapter
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0>
> library
>
>
>
> This is on a single machine topology – we are still in dev phase and using
> this solution for now -
>
>
>
> Is this expected?
>
> Should we try to minimize multi-lang and inter-process or is this a
> problem with my specific scenario (mono and/or single machine) ?
>
>
>
> Thank you –
>
> Mauro.
>
>

RE: Performance of Multi-Lang protocol

Posted by Mauro Giusti <ma...@microsoft.com>.
My PC:
My PC is a 8 Core Xeon E5 with 16 GB of RAM, when the test starts, I only have 8 GB of memory occupied.
I increased the memory of the Java VM to 4 GB and it only uses 1 GB when the test runs.

The Topology:
On my PC, I have three Spouts in mono, and one Bolt in mono.
The topology is described in Flux – so I have basically zero code in Java, all in Flux .yaml + .Net with mono.
All the messages use SHUFFLE and there is one worker only (my PC)

I run in local mode and I also have a Docker container where I deployed this.

Topology details:
The Spouts read from an internal service, I collect about 60/70,000 records each minute.

The Bolt reads from the three Spouts and makes aggregation in memory using SqlLite, the records are added to SqlLite as they arrive, then every 30 seconds SqlLite runs an aggregation and emits the data to an instance of Redis cache (via another Bolt hop).

To test with Java, I replaced the Bolt with a simple Java Bolt that was only logging every 10,000 records.
To compare with Mono, I created an empty .net Bolt and did the same.

My Tests:
The Flux topology is attached.
The Java class I used to test and the .Net Bolt are as well.
Again, the Spouts are .Net classes that emits 65K rows per minute.

The log files are attached, you can see how much time it takes for the Bolt to consume 10,000 records –
Inter-Language.txt is on my PC using the mono debug bolt, each 10,000 records takes around 4.5 seconds.
The Java.txt is on my PC using Java (TransformEchoBolt.Java), each 10,000 records takes around 0.7 seconds.
The Linux.txt is on the Docker container (still on my PC but using Docker for Windows in Linux Dockers mode), using mono but on Linux this time - the results are compatible with Mono on Windows (4.5 seconds per 10.000 records).
I also tried calling directly the Windows exe on Windows in local mode, bypassing mono – the results were not pretty: 15 seconds per 10,000 records (NetExe.txt)

Results:
I know I can scale out and partition the data, but the amount of processing did not seem to require that –

Maybe one issue is that the object I am moving has 11 fields?

I can try to create a mini-repro if the dev team is interested – hopefully this might find what the bottleneck is -

Thanks for your attention -
Mauro.

From: P. Taylor Goetz [mailto:ptgoetz@gmail.com]
Sent: Friday, May 12, 2017 4:55 PM
To: user@storm.apache.org; dev@storm.apache.org
Subject: Re: Performance of Multi-Lang protocol

Adding dev@ mailing list...

There is definitely a performance hit. But it shouldn't be as drastic as you describe.

Can you share some of your environment characteristics?

I've been looking at the Apache Arrow project (full disclosure: I'm a PMC member) as a means for improved performance (it essentially would remove the performance hit for serialize/deserialize operations). This is particularly relevant to multi-lang, but could also apply to same-machine inter-worker communication.

At this point I don't feel Arrow is at a production level maturity, but is getting close. I definitely feel it's worth exploring at PoC level.

-Taylor

On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com>> wrote:
Hi –
We are using multi-lang to pass data between storm and mono –

We observe a 6x time increase when messages go from spout to bolt if the bolt is in mono vs. being in Java –

Java can process 10,000 records in 0.7 seconds, while mono requires 4.5 seconds.
The mono bolt was an empty one created with Storm.Net.Adapter<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fziyunhx%2Fstorm-net-adapter&data=02%7C01%7Cmaurgi%40microsoft.com%7Cc1d9c2b13bab4297b2b508d499924f9d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636302300991869578&sdata=kaE4OjEttJv0KuGcwdUoJA%2BBDXIO1qvyv65S%2BBpMM%2F0%3D&reserved=0> library

This is on a single machine topology – we are still in dev phase and using this solution for now -

Is this expected?
Should we try to minimize multi-lang and inter-process or is this a problem with my specific scenario (mono and/or single machine) ?

Thank you –
Mauro.

Re: Performance of Multi-Lang protocol

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Adding dev@ mailing list...

There is definitely a performance hit. But it shouldn't be as drastic as you describe.

Can you share some of your environment characteristics?

I've been looking at the Apache Arrow project (full disclosure: I'm a PMC member) as a means for improved performance (it essentially would remove the performance hit for serialize/deserialize operations). This is particularly relevant to multi-lang, but could also apply to same-machine inter-worker communication.

At this point I don't feel Arrow is at a production level maturity, but is getting close. I definitely feel it's worth exploring at PoC level.

-Taylor

> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
> 
> Hi –
> We are using multi-lang to pass data between storm and mono –
>  
> We observe a 6x time increase when messages go from spout to bolt if the bolt is in mono vs. being in Java –
>  
> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5 seconds.
> The mono bolt was an empty one created with Storm.Net.Adapter library
>  
> This is on a single machine topology – we are still in dev phase and using this solution for now -
>  
> Is this expected?
> Should we try to minimize multi-lang and inter-process or is this a problem with my specific scenario (mono and/or single machine) ?
>  
> Thank you –
> Mauro.

Re: Performance of Multi-Lang protocol

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Adding dev@ mailing list...

There is definitely a performance hit. But it shouldn't be as drastic as you describe.

Can you share some of your environment characteristics?

I've been looking at the Apache Arrow project (full disclosure: I'm a PMC member) as a means for improved performance (it essentially would remove the performance hit for serialize/deserialize operations). This is particularly relevant to multi-lang, but could also apply to same-machine inter-worker communication.

At this point I don't feel Arrow is at a production level maturity, but is getting close. I definitely feel it's worth exploring at PoC level.

-Taylor

> On May 12, 2017, at 6:56 PM, Mauro Giusti <ma...@microsoft.com> wrote:
> 
> Hi –
> We are using multi-lang to pass data between storm and mono –
>  
> We observe a 6x time increase when messages go from spout to bolt if the bolt is in mono vs. being in Java –
>  
> Java can process 10,000 records in 0.7 seconds, while mono requires 4.5 seconds.
> The mono bolt was an empty one created with Storm.Net.Adapter library
>  
> This is on a single machine topology – we are still in dev phase and using this solution for now -
>  
> Is this expected?
> Should we try to minimize multi-lang and inter-process or is this a problem with my specific scenario (mono and/or single machine) ?
>  
> Thank you –
> Mauro.