You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@skywalking.apache.org by Aries <da...@foxmail.com> on 2020/07/14 02:13:19 UTC

save tracing data to file.

Hi all:     I have noticed that skywalking use heap buffer to cache tracing data. It usually cause data loss. Because of this problem, I want to add a high-performance  file storage to skywalking,so that tracing data can be saved to disk.      If tracing data saved to file,skywalking will have strong ability to accumulate data and we do not have to care about how many tracing data provided or whether  OAP server working, data had saved.     Do we need this feature? Any suggestions?            Thanks

Re: save tracing data to file.

Posted by xkz <da...@foxmail.com>.

Hi
Thanks for Sheng Wu’s reply. Do you mean avoiding unlimited memory increasing by saying 'because we need to keep memory safe'?
 
Since we can not use google’s products in China, I try to describe my design in detail in this email. Sorry about this.
The file storage I said was inspired by RoketMQ and MetaQ(Alibaba-inc internal project).
 
First, the read/write mode:
I will use a direct buffer pool, default 100M per buffer and pool size is 2. When a file is created(called StoreFile), take a direct buffer from pool and set as an instance field, write message in the buffer, async write buffer to file-channel and async flush buffer to disk.
Read message has two modes: in channel or in direct buffer. When creating a new file, if take a buffer from buffer pool failed(buffer pool return null), it will return previous file’s buffer to buffer pool. So, current files(depend on buffer pool size) read message in buffer, previous files read message in channel. 
This design could save direct memory for one physical machine running many application instances.
 
Second, when and how to activate this feature:
This feature I think it’s very suitable for tracing data provided fast but OAP server or STORAGE consumes slowly, so file storage should be very important, because we don’t need to worry about data loss.
I want to add a config key in agent profile, if the config value configured as FILE_CACHE or default as FILE_CACHE, the feature can be activated.
  
Third, what is performance:
I tested it on my PC(Macbook pro 2016, 2 core 8g RAM, -Xms1g  -Xmx1g) with only one thread, put 1000_000 messages(2000 bytes per message) to file costs about 20 seconds(generate random string and save to disk total cost 20 seconds), 50_000 message per second.
 
Any questions, reply mail for detail.
Thanks.
 
Dao Jun  道君
Alibaba-inc, tjm

> 在 2020年7月14日，上午10:34，Sheng Wu <wu...@gmail.com> 写道：
> 
> Hi
> 
>> I have noticed that skywalking use heap buffer to cache tracing data.
> 
> It could cause data loss, but that is intentional. Because we need to keep
> memory safe.
> 
> Back to what your asking, if you want to build a local file system based
> cache, I think you should submit a design, including
> 1. What is the file write/read model
> 2. When should activate this feature, and how
> 3. What is performance? And do you have available benchmark result between
> memory write and file write in high concurrency situation.
> more if you thing need to say.
> 
> You could use this[1] as design doc template. Look forward to your detail.
> 
> [1]
> https://docs.google.com/document/d/1biRE3Bc0cTbs7qnBozUuAxCmeP5n8y0JKJAyzqitLnM/edit
> 
> Sheng Wu 吴晟
> Twitter, wusheng1108
> 
> 
> Aries <da...@foxmail.com> 于2020年7月14日周二 上午10:13写道：
> 
>> Hi all:     I have noticed that skywalking use heap buffer to cache
>> tracing data. It usually cause data loss. Because of this problem, I want
>> to add a high-performance  file storage to skywalking,so that tracing data
>> can be saved to disk.      If tracing data saved to file,skywalking will
>> have strong ability to accumulate data and we do not have to care about how
>> many tracing data provided or whether  OAP server working, data had saved.
>>   Do we need this feature? Any suggestions?            Thanks

Re: save tracing data to file.

Posted by 道君 <da...@foxmail.com>.

Hi
Thanks for reply.

First, what mean about `inspired`:
It means I had read their source code then I know how a MQ file storage works, but, I think them not suitable for application agent but suitable for broker, because of it use very large direct buffer and too complex. So I redesign and simplify and code with my own. No source code copy, totally different from them, just learn from them.

Second,  k8s limited perf of disk:
I async flush channel buffer to disk, so, it will not effect to write, only effect to read(in channel mode).


Third, read data from DataCarrier:
I don’t test the case, file storage don’t care about the class instance, only bytes. I think we can choose a high perf object serialize util to serialize object to bytes. I think protostuff is good.

Any questions, reply mail for detail.
Thanks.

Dao Jun 道君
Alibaba-inc, tjm


> 在 2020年7月14日，下午4:44，Sheng Wu <wu...@gmail.com> 写道：
> 
> Hi
> 
> Inline
> 
> 
> xkz <da...@foxmail.com> 于2020年7月14日周二 下午3:32写道：
> 
>> Hi
>> Thanks for Sheng Wu’s reply. Do you mean avoiding unlimited memory
>> increasing by saying 'because we need to keep memory safe'?
>> 
>> Since we can not use google’s products in China, I try to describe my
>> design in detail in this email. Sorry about this.
>> The file storage I said was inspired by RoketMQ and MetaQ(Alibaba-inc
>> internal project).
>> 
> 
> Please define `Inspired`. Because
> 1. If you have copied some codes from RocketMQ, we need to indicate them
> and update LICENSE to describe we did.
> 2. At the same time, if codes are from MetaQ, we need Alibaba SGA about
> those codes, because those are codes owned by a company and not
> open-sourced.
> This is very important to us. Please make sure there is no IP issue.
> 
> 
>> 
>> First, the read/write mode:
>> I will use a direct buffer pool, default 100M per buffer and pool size is
>> 2. When a file is created(called StoreFile), take a direct buffer from pool
>> and set as an instance field, write message in the buffer, async write
>> buffer to file-channel and async flush buffer to disk.
>> Read message has two modes: in channel or in direct buffer. When creating
>> a new file, if take a buffer from buffer pool failed(buffer pool return
>> null), it will return previous file’s buffer to buffer pool. So, current
>> files(depend on buffer pool size) read message in buffer, previous files
>> read message in channel.
>> This design could save direct memory for one physical machine running many
>> application instances.
>> 
>> Second, when and how to activate this feature:
>> This feature I think it’s very suitable for tracing data provided fast but
>> OAP server or STORAGE consumes slowly, so file storage should be very
>> important, because we don’t need to worry about data loss.
>> I want to add a config key in agent profile, if the config value
>> configured as FILE_CACHE or default as FILE_CACHE, the feature can be
>> activated.
>> 
> 
> This looks good for me, and this should be OFF in default, as today, in
> many k8s deployments, there is very limited perf of local disk.
> 
> 
>> 
>> Third, what is performance:
>> I tested it on my PC(Macbook pro 2016, 2 core 8g RAM, -Xms1g  -Xmx1g) with
>> only one thread, put 1000_000 messages(2000 bytes per message) to file
>> costs about 20 seconds(generate random string and save to disk total cost
>> 20 seconds), 50_000 message per second.
>> 
> 
> 50k/s seems fine, but do you read the data from the DataCarrier, then write
> to the file? Or the TracingContext access the file buffer directly? Those
> are different scenarios and have different performance requirements.
> 
> 
>> 
>> Any questions, reply mail for detail.
>> Thanks.
>> 
>> Dao Jun  道君
>> Alibaba-inc, tjm
>> 
>>> 在 2020年7月14日，上午10:34，Sheng Wu <wu...@gmail.com> 写道：
>>> 
>>> Hi
>>> 
>>>> I have noticed that skywalking use heap buffer to cache tracing data.
>>> 
>>> It could cause data loss, but that is intentional. Because we need to
>> keep
>>> memory safe.
>>> 
>>> Back to what your asking, if you want to build a local file system based
>>> cache, I think you should submit a design, including
>>> 1. What is the file write/read model
>>> 2. When should activate this feature, and how
>>> 3. What is performance? And do you have available benchmark result
>> between
>>> memory write and file write in high concurrency situation.
>>> more if you thing need to say.
>>> 
>>> You could use this[1] as design doc template. Look forward to your
>> detail.
>>> 
>>> [1]
>>> 
>> https://docs.google.com/document/d/1biRE3Bc0cTbs7qnBozUuAxCmeP5n8y0JKJAyzqitLnM/edit
>>> 
>>> Sheng Wu 吴晟
>>> Twitter, wusheng1108
>>> 
>>> 
>>> Aries <da...@foxmail.com> 于2020年7月14日周二 上午10:13写道：
>>> 
>>>> Hi all:     I have noticed that skywalking use heap buffer to cache
>>>> tracing data. It usually cause data loss. Because of this problem, I
>> want
>>>> to add a high-performance  file storage to skywalking,so that tracing
>> data
>>>> can be saved to disk.      If tracing data saved to file,skywalking will
>>>> have strong ability to accumulate data and we do not have to care about
>> how
>>>> many tracing data provided or whether  OAP server working, data had
>> saved.
>>>>  Do we need this feature? Any suggestions?            Thanks
>> 
>>

Re: save tracing data to file.

Posted by 道君 <da...@foxmail.com>.

Hi
Thanks for reply.

I am sure that I don’t violate anything from the IP law. I just had read the source code and totally different.

I  agreed about add `inspired by RocketMQ` in the main entrance class comment and I will do that.

About the code:
I will add code in `apm-datacarrier` module and import dependency `protostuff`.  Implements QueueBuffer interface, when a message to add, serialize it and save to file.
When agent shutdown, files will be deleted. I don’t want to implements file recovery, because it’s useless in k8s.

But there is one question: 
How could I use log in the module? Logging is in the `agent-core` module, and `agent-core` depend on `data-carrier`.

Any questions, reply message for detail.
Thanks.

Dao Jun 道君
Alibaba-inc, tjm


> 在 2020年7月14日，下午5:23，Sheng Wu <wu...@gmail.com> 写道：
> 
> 道君 <da...@foxmail.com> 于2020年7月14日周二 下午5:19写道：
> 
>> Hi
>> Thanks for reply.
>> 
>> First, what mean about `inspired`:
>> It means I had read their source code then I know how a MQ file storage
>> works, but, I think them not suitable for application agent but suitable
>> for broker, because of it use very large direct buffer and too complex. So
>> I redesign and simplify and code with my own. No source code copy, totally
>> different from them, just learn from them.
>> 
> 
> This is good to know. But please add `inspired by RocketMQ` on the main
> entrance class comment. We should show respect to the original author, even
> we don't copy anything from them.
> For MetaQ, we can't say it, as it is open to you only, we can't know what
> happens inside. Just make sure you don't violate anything from the IP law
> perspective, as you are from Alibaba team, so this would be good for both
> of us.
> 
> 
>> 
>> Second,  k8s limited perf of disk:
>> I async flush channel buffer to disk, so, it will not effect to write,
>> only effect to read(in channel mode).
>> 
> 
> My point is, this feature is optional, and the default is OFF.
> 
> 
>> 
>> 
>> Third, read data from DataCarrier:
>> I don’t test the case, file storage don’t care about the class instance,
>> only bytes. I think we can choose a high perf object serialize util to
>> serialize object to bytes. I think protostuff is good.
>> 
> 
> This is about where do you plan to add the codes into the agent. Could you
> explain this more clear?
> 
> 
>> 
>> Any questions, reply mail for detail.
>> Thanks.
>> 
>> Dao Jun 道君
>> Alibaba-inc, tjm
>> 
>> 
>>> 在 2020年7月14日，下午4:44，Sheng Wu <wu...@gmail.com> 写道：
>>> 
>>> Hi
>>> 
>>> Inline
>>> 
>>> 
>>> xkz <da...@foxmail.com> 于2020年7月14日周二 下午3:32写道：
>>> 
>>>> Hi
>>>> Thanks for Sheng Wu’s reply. Do you mean avoiding unlimited memory
>>>> increasing by saying 'because we need to keep memory safe'?
>>>> 
>>>> Since we can not use google’s products in China, I try to describe my
>>>> design in detail in this email. Sorry about this.
>>>> The file storage I said was inspired by RoketMQ and MetaQ(Alibaba-inc
>>>> internal project).
>>>> 
>>> 
>>> Please define `Inspired`. Because
>>> 1. If you have copied some codes from RocketMQ, we need to indicate them
>>> and update LICENSE to describe we did.
>>> 2. At the same time, if codes are from MetaQ, we need Alibaba SGA about
>>> those codes, because those are codes owned by a company and not
>>> open-sourced.
>>> This is very important to us. Please make sure there is no IP issue.
>>> 
>>> 
>>>> 
>>>> First, the read/write mode:
>>>> I will use a direct buffer pool, default 100M per buffer and pool size
>> is
>>>> 2. When a file is created(called StoreFile), take a direct buffer from
>> pool
>>>> and set as an instance field, write message in the buffer, async write
>>>> buffer to file-channel and async flush buffer to disk.
>>>> Read message has two modes: in channel or in direct buffer. When
>> creating
>>>> a new file, if take a buffer from buffer pool failed(buffer pool return
>>>> null), it will return previous file’s buffer to buffer pool. So, current
>>>> files(depend on buffer pool size) read message in buffer, previous files
>>>> read message in channel.
>>>> This design could save direct memory for one physical machine running
>> many
>>>> application instances.
>>>> 
>>>> Second, when and how to activate this feature:
>>>> This feature I think it’s very suitable for tracing data provided fast
>> but
>>>> OAP server or STORAGE consumes slowly, so file storage should be very
>>>> important, because we don’t need to worry about data loss.
>>>> I want to add a config key in agent profile, if the config value
>>>> configured as FILE_CACHE or default as FILE_CACHE, the feature can be
>>>> activated.
>>>> 
>>> 
>>> This looks good for me, and this should be OFF in default, as today, in
>>> many k8s deployments, there is very limited perf of local disk.
>>> 
>>> 
>>>> 
>>>> Third, what is performance:
>>>> I tested it on my PC(Macbook pro 2016, 2 core 8g RAM, -Xms1g  -Xmx1g)
>> with
>>>> only one thread, put 1000_000 messages(2000 bytes per message) to file
>>>> costs about 20 seconds(generate random string and save to disk total
>> cost
>>>> 20 seconds), 50_000 message per second.
>>>> 
>>> 
>>> 50k/s seems fine, but do you read the data from the DataCarrier, then
>> write
>>> to the file? Or the TracingContext access the file buffer directly? Those
>>> are different scenarios and have different performance requirements.
>>> 
>>> 
>>>> 
>>>> Any questions, reply mail for detail.
>>>> Thanks.
>>>> 
>>>> Dao Jun  道君
>>>> Alibaba-inc, tjm
>>>> 
>>>>> 在 2020年7月14日，上午10:34，Sheng Wu <wu...@gmail.com> 写道：
>>>>> 
>>>>> Hi
>>>>> 
>>>>>> I have noticed that skywalking use heap buffer to cache tracing data.
>>>>> 
>>>>> It could cause data loss, but that is intentional. Because we need to
>>>> keep
>>>>> memory safe.
>>>>> 
>>>>> Back to what your asking, if you want to build a local file system
>> based
>>>>> cache, I think you should submit a design, including
>>>>> 1. What is the file write/read model
>>>>> 2. When should activate this feature, and how
>>>>> 3. What is performance? And do you have available benchmark result
>>>> between
>>>>> memory write and file write in high concurrency situation.
>>>>> more if you thing need to say.
>>>>> 
>>>>> You could use this[1] as design doc template. Look forward to your
>>>> detail.
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1biRE3Bc0cTbs7qnBozUuAxCmeP5n8y0JKJAyzqitLnM/edit
>>>>> 
>>>>> Sheng Wu 吴晟
>>>>> Twitter, wusheng1108
>>>>> 
>>>>> 
>>>>> Aries <da...@foxmail.com> 于2020年7月14日周二 上午10:13写道：
>>>>> 
>>>>>> Hi all:     I have noticed that skywalking use heap buffer to cache
>>>>>> tracing data. It usually cause data loss. Because of this problem, I
>>>> want
>>>>>> to add a high-performance  file storage to skywalking,so that tracing
>>>> data
>>>>>> can be saved to disk.      If tracing data saved to file,skywalking
>> will
>>>>>> have strong ability to accumulate data and we do not have to care
>> about
>>>> how
>>>>>> many tracing data provided or whether  OAP server working, data had
>>>> saved.
>>>>>> Do we need this feature? Any suggestions?            Thanks
>>>> 
>>>> 
>> 
>>