You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Dhruv Kumar <dh...@umn.edu.INVALID> on 2021/04/21 17:19:47 UTC

Benchmarks for Many-to-Many Joins

Hi

I wanted to ask if anyone knows any datasets or benchmarks which I can use for evaluating many-to-many joins (as depicted in the attached snapshot). I looked at TPC-H <http://tpc.org/tpch/> and TPC-DS <http://www.tpc.org/tpcds/> benchmarks but surprisingly, they mostly have one-to-many joins and I could not get much help there.





Thanks
Dhruv

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me <http://dhruvkumar.me/>

Re: Benchmarks for Many-to-Many Joins

Posted by waltercai <wa...@cs.washington.edu>.
Hi Dhruv,

One option is the  join order benchmark
<https://github.com/gregrahn/join-order-benchmark>  ; it has become very
popular in DB research over the past couple years and features many-many
joins. Another option is crafting many-many queries from graph datasets like
social media or travel networks.

Walter



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org