You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@teaclave.apache.org by ly137062117 <no...@github.com> on 2020/06/22 12:17:25 UTC

[apache/incubator-teaclave] function中支持对输入文件进行流式读写吗? (#368)

最近在teaclave中执行gbdt训练时,发现对大样本文件(大约1.7G)进行训练时占用内存十分巨大。
因此想请问下,在利用Runtime获取到输入文件的reader和输出文件的writer时,reader.lines() 是将整个文件内容加载到内存中,进行操作,还是流式操作的呢?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-teaclave/issues/368

Re: [apache/incubator-teaclave] function中支持对输入文件进行流式读写吗? (#368)

Posted by ly137062117 <no...@github.com>.
@mssun 那请问下,在 tee 里边,通过 BufWriter 包装了 io::Write 也能实现流式地写吗?那这种情况下, tee 对输出文件中的内容进行加密的机制是逐行写入,逐行加密吗?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-teaclave/issues/368#issuecomment-647857205

Re: [apache/incubator-teaclave] function中支持对输入文件进行流式读写吗? (#368)

Posted by Mingshen Sun <no...@github.com>.
I'm closing this issue. Feel free to reopen or create a new one if you have further questions. Thanks.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-teaclave/issues/368#issuecomment-652115258

Re: [apache/incubator-teaclave] function中支持对输入文件进行流式读写吗? (#368)

Posted by Mingshen Sun <no...@github.com>.
`BufReader` 实现了 `BufRead` trait,就是所谓的 “流式操作”。

内存使用可能是其他问题造成的,比如说 samples:

https://github.com/apache/incubator-teaclave/blob/0316757e2dfe748185b84d1bff5ae04701b46c8f/function/src/gbdt_train.rs#L135

或者其他问题引起的,需要详细 review/profile 代码。

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-teaclave/issues/368#issuecomment-647784381

Re: [apache/incubator-teaclave] function中支持对输入文件进行流式读写吗? (#368)

Posted by Mingshen Sun <no...@github.com>.
Teaclave execution service 使用的 secure file system 基于 `protected_fs`, (https://github.com/apache/incubator-teaclave/blob/master/common/protected_fs_rs/src/sgx_tprotected_fs.rs) 提供了 POSIX compatible 的 file I/O 接口。

对于加密方式,不是“逐行写入,逐行加密”,而是按照 block 进行,提供了 LRU cache。

如果想了解更多,可以参考 protected fs 的代码:https://github.com/apache/incubator-teaclave/tree/master/common/protected_fs_rs/protected_fs_c/sgx_tprotected_fs

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-teaclave/issues/368#issuecomment-647861215

Re: [apache/incubator-teaclave] function中支持对输入文件进行流式读写吗? (#368)

Posted by Mingshen Sun <no...@github.com>.
Closed #368.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-teaclave/issues/368#event-3500134877

Re: [apache/incubator-teaclave] function中支持对输入文件进行流式读写吗? (#368)

Posted by TX <no...@github.com>.
关于内存占用的问题,主要是gbdt-rs算法实现引起的。在进行训练时,gbdt-rs需要使用所有的数据进行计算。设置不同的`training_optimization_level`在训练时会有不同的内存访问模式和内部数据表达,占用的内存也会不一样。其中,`training_optimization_level=0或1`使用的内存会比`training_optimization_level=2`小。

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-teaclave/issues/368#issuecomment-647873489