You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Jiangtao Peng <pe...@gmail.com> on 2022/09/14 06:34:46 UTC
[C++][Gandiva] string expression evaluation performance issue using mimalloc
Hi all,
Arrow use jemalloc as default memory allocator. For some reason, I am going
to use mimalloc instead. But there seems have big performance difference
between two memory allocators.
Here are my steps.
I use simple compile options:
*-DCMAKE_BUILD_TYPE*=debug \
*-DARROW_JEMALLOC*=OFF|ON \
*-DARROW_MIMALLOC*=ON|OFF \
*-DARROW_GANDIVA*=ON \
*-DARROW_GANDIVA_STATIC_LIBSTDCPP*=ON \
*-DARROW_BUILD_TESTS*=ON
Then I write a simple case:
void TestPerf(int64_t char_length, int64_t num_records) {
// schema for input fields
auto field_a = field("a", utf8());
auto schema = arrow::schema(*{*field_a*}*);
// output fields
auto res = field("res", utf8());
auto node_a = TreeExprBuilder::MakeField(field_a);
auto upper_a = TreeExprBuilder::MakeFunction("upper", *{*node_a*}*, utf8());
auto expr = TreeExprBuilder::MakeExpression(upper_a, res);
// Build a projector for the expressions.
std::shared_ptr<Projector> projector;
auto status = Projector::Make(schema, *{*expr*}*,
TestConfiguration(), &projector);
EXPECT_TRUE(status.ok()) << status.message();
std::string val = std::string(char_length, 'a');
arrow::StringBuilder builder;
for (int i = 0; i < num_records; i++) {
auto _ = builder.Append(val);
}
std::shared_ptr<arrow::StringArray> array_a;
auto _ = builder.Finish(&array_a);
// prepare input record batch
auto in_batch = arrow::RecordBatch::Make(schema, num_records, *{*array_a*}*);
auto start_epoch = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch())
.count();
// Evaluate expression
arrow::ArrayVector outputs;
status = projector->Evaluate(*in_batch,
arrow::default_memory_pool(), &outputs);
EXPECT_TRUE(status.ok()) << status.message();
std::*cout *<< std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch())
.count() -
start_epoch
<< "ms" << std::endl;
}
TestPerf(20, 10000);TestPerf(20, 100000);TestPerf(200,
10000);TestPerf(200, 100000);TestPerf(2000, 10000);
this case is going to calculate expression “upper(a)”, “a” has different
size with 20/200/2000. Evaluation time results:
| char_length | num_records | Using Mimalloc (ms) | Using Jemalloc(ms) |
| 20 | 10000 | 29 | 3 |
| 20 | 100000 | 2686 | 26 |
| 200 | 10000 | 954 | 11 |
| 200 | 100000 | 220153 | 118 |
| 2000 | 10000 | 21162 | 89 |
Is this performance gap expected? Or any other compile options should I
note? How can I make performance better using mimalloc?
Best,
Jiangtao