You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "wanweiqiangintel (via GitHub)" <gi...@apache.org> on 2023/05/06 07:28:33 UTC
[GitHub] [arrow] wanweiqiangintel opened a new issue, #35460: can we use simdjson to replace rapidjson
wanweiqiangintel opened a new issue, #35460:
URL: https://github.com/apache/arrow/issues/35460
### Describe the enhancement requested
As the performance result mentioned in simdjson community: the simdjson library uses three-quarters less instructions than state-of-the-art parser RapidJSON. And the throughput of simdjson is much higher than that of rapidjson:
![image](https://user-images.githubusercontent.com/89506884/236609780-20a8787b-9000-4570-a974-ec7107ca8879.png)
So can we replace rapidjson with simdjson to implement json parser?
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] mapleFU commented on issue #35460: can we use simdjson to replace rapidjson
Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545741395
Seems that writer can still use original logic, but parser can make full use of simdjson?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lemire commented on issue #35460: can we use simdjson to replace rapidjson
Posted by "lemire (via GitHub)" <gi...@apache.org>.
lemire commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545020139
We are available to help.
Note that simdjson is used by Apache Doris and ClickHouse.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #35460: can we use simdjson to replace rapidjson
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545853564
Is there any merit to use both RapidJSON and simdjson?
I think that using either RapidJSON and simdjson will reduce our maintenance cost.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #35460: can we use simdjson to replace rapidjson
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1537086573
If simdjson is faster than RapidJSON for our use case too, I'm OK with this.
Could you try this and share our benchmark result?
https://github.com/apache/arrow/blob/main/cpp/src/arrow/json/parser_benchmark.cc
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #35460: [C++] can we use simdjson to replace rapidjson
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1561546149
Agreed with @kou , we probably want to avoid depending on two different JSON libraries.
Interested people should try working on a PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on issue #35460: [C++] can we use simdjson to replace rapidjson
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1561548342
I'm skeptical switching to simdjson would improve performance a lot, btw. Parsing is only a small part of the work necessary to convert JSON to Arrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #35460: can we use simdjson to replace rapidjson
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545720496
Great!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org