You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "wanweiqiangintel (via GitHub)" <gi...@apache.org> on 2023/05/06 07:28:33 UTC

[GitHub] [arrow] wanweiqiangintel opened a new issue, #35460: can we use simdjson to replace rapidjson

wanweiqiangintel opened a new issue, #35460:
URL: https://github.com/apache/arrow/issues/35460

   ### Describe the enhancement requested
   
   As the performance result mentioned in simdjson community: the simdjson library uses three-quarters less instructions than state-of-the-art parser RapidJSON. And the throughput of simdjson is  much higher than that of rapidjson:
   ![image](https://user-images.githubusercontent.com/89506884/236609780-20a8787b-9000-4570-a974-ec7107ca8879.png)
   
   So can we replace rapidjson with simdjson to implement json parser?
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #35460: can we use simdjson to replace rapidjson

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545741395

   Seems that writer can still use original logic, but parser can make full use of simdjson?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lemire commented on issue #35460: can we use simdjson to replace rapidjson

Posted by "lemire (via GitHub)" <gi...@apache.org>.
lemire commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545020139

   We are available to help.
   
   Note that simdjson is used by Apache Doris and ClickHouse.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35460: can we use simdjson to replace rapidjson

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545853564

   Is there any merit to use both RapidJSON and simdjson?
   
   I think that using either RapidJSON and simdjson will reduce our maintenance cost.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35460: can we use simdjson to replace rapidjson

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1537086573

   If simdjson is faster than RapidJSON for our use case too, I'm OK with this.
   
   Could you try this and share our benchmark result?
   https://github.com/apache/arrow/blob/main/cpp/src/arrow/json/parser_benchmark.cc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35460: [C++] can we use simdjson to replace rapidjson

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1561546149

   Agreed with @kou , we probably want to avoid depending on two different JSON libraries.
   
   Interested people should try working on a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35460: [C++] can we use simdjson to replace rapidjson

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1561548342

   I'm skeptical switching to simdjson would improve performance a lot, btw. Parsing is only a small part of the work necessary to convert JSON to Arrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35460: can we use simdjson to replace rapidjson

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35460:
URL: https://github.com/apache/arrow/issues/35460#issuecomment-1545720496

   Great!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org