You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Walter Cai <wa...@cs.washington.edu> on 2021/04/22 01:08:58 UTC

modifying spark's optimizer for research

Hi,

I'm Walter, a PhD student at the University of Washington. My goal is to
implement a prototype modification to spark's optimizer to
exhibit/experiment some of my PhD work. I was hoping to set up a chat with
somebody who is familiar with catalyst and the best place to start
modifying.

Thanks,
Walter
walter@cs.washington.edu

Re: modifying spark's optimizer for research

Posted by Walter Cai <wa...@cs.washington.edu>.
Hi Cheng Su and All,

Thanks for your reply; the change I'm attempting to make would be a
significant philosophical change to how optimizers currently handle
cardinality estimation. With that in mind, I think it might be wiser to
first perform a prototype/proof of concept as versus the traditional pull
request and review workflow.

For some more context on my method: the central idea of my work is to lean
heavily towards overestimation
<https://dl.acm.org/doi/10.1145/3299869.3319894> during the cardinality
estimation process using the elegant entropic bounding
<https://arxiv.org/pdf/1612.02503.pdf> framework. Particularly for
multi-join queries this avoids the underestimation problem that pervades
modern systems. So far my work has focused on single node DBs and scaling
to multi-node systems presents new hurdles; hence why I'm here.

Thanks,
Walter

On Wed, Apr 21, 2021 at 11:46 PM Cheng Su <ch...@fb.com> wrote:

> Hello Walter,
>
>
>
> Just FYI - https://spark.apache.org/contributing.html is the general
> guide for how to contributing in Spark.
>
>
>
> > implement a prototype modification to spark's optimizer to
> exhibit/experiment some of my PhD work
>
>
>
> Maybe you could share some links or pointers for the work you have done?
> So this can help give people some basic ideas and provide help more
> specifically.
>
>
>
> Thanks,
>
> Cheng Su
>
>
>
> *From: *Walter Cai <wa...@cs.washington.edu>
> *Date: *Wednesday, April 21, 2021 at 6:09 PM
> *To: *"dev@spark.apache.org" <de...@spark.apache.org>
> *Subject: *modifying spark's optimizer for research
>
>
>
> Hi,
>
>
>
> I'm Walter, a PhD student at the University of Washington. My goal is to
> implement a prototype modification to spark's optimizer to
> exhibit/experiment some of my PhD work. I was hoping to set up a chat with
> somebody who is familiar with catalyst and the best place to start
> modifying.
>
>
>
> Thanks,
>
> Walter
>
> walter@cs.washington.edu
>

Re: modifying spark's optimizer for research

Posted by Cheng Su <ch...@fb.com.INVALID>.
Hello Walter,

Just FYI - https://spark.apache.org/contributing.html is the general guide for how to contributing in Spark.

> implement a prototype modification to spark's optimizer to exhibit/experiment some of my PhD work

Maybe you could share some links or pointers for the work you have done? So this can help give people some basic ideas and provide help more specifically.

Thanks,
Cheng Su

From: Walter Cai <wa...@cs.washington.edu>
Date: Wednesday, April 21, 2021 at 6:09 PM
To: "dev@spark.apache.org" <de...@spark.apache.org>
Subject: modifying spark's optimizer for research

Hi,

I'm Walter, a PhD student at the University of Washington. My goal is to implement a prototype modification to spark's optimizer to exhibit/experiment some of my PhD work. I was hoping to set up a chat with somebody who is familiar with catalyst and the best place to start modifying.

Thanks,
Walter
walter@cs.washington.edu<ma...@cs.washington.edu>