You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/04/04 03:19:40 UTC

is there a way to persist the lineages generated by spark?

Hi All,

I am wondering if there a way to persist the lineages generated by spark
underneath? Some of our clients want us to prove if the result of the
computation that we are showing on a dashboard is correct and for that If
we can show the lineage of transformations that are executed to get to the
result then that can be the Q.E.D moment but I am not even sure if this is
even possible with spark?

Thanks,
kant

Re: is there a way to persist the lineages generated by spark?

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

I think that every client wants a validation process, but showing lineage
is a approach that they are not asking, and may not be the right way to
prove it.


Regards,
Gourav

On Tue, Apr 4, 2017 at 4:19 AM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> I am wondering if there a way to persist the lineages generated by spark
> underneath? Some of our clients want us to prove if the result of the
> computation that we are showing on a dashboard is correct and for that If
> we can show the lineage of transformations that are executed to get to the
> result then that can be the Q.E.D moment but I am not even sure if this is
> even possible with spark?
>
> Thanks,
> kant
>

Re: is there a way to persist the lineages generated by spark?

Posted by ayan guha <gu...@gmail.com>.
How about storing logical plans (or printDebugString, in case of RDD) to an
external file on the driver?

On Tue, Apr 4, 2017 at 1:19 PM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> I am wondering if there a way to persist the lineages generated by spark
> underneath? Some of our clients want us to prove if the result of the
> computation that we are showing on a dashboard is correct and for that If
> we can show the lineage of transformations that are executed to get to the
> result then that can be the Q.E.D moment but I am not even sure if this is
> even possible with spark?
>
> Thanks,
> kant
>



-- 
Best Regards,
Ayan Guha

Re: is there a way to persist the lineages generated by spark?

Posted by Tom Lynch <to...@machinelearningdeveloper.com>.
This is not quite what you are asking, but I often save intermediate
results down to parquet files so I can diagnose problems and rebuild data
from a known good state without having to re-run every processing step.

On Fri, Apr 7, 2017 at 1:08 AM, kant kodali <ka...@gmail.com> wrote:

> yes Lineage that is actually replayable is what is needed for Validation
> process. So we can address questions like how a system arrived at a state S
> at a time T. I guess a good analogy is event sourcing.
>
>
> On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke <jo...@gmail.com> wrote:
>
>> I do think this is the right way, you will have to do testing with test
>> data verifying that the expected output of the calculation is the output.
>> Even if the logical Plan Is correct your calculation might not be. E.g.
>> There can be bugs in Spark, in the UI or (what is very often) the client
>> describes a calculation, but in the end the description is wrong.
>>
>> > On 4. Apr 2017, at 05:19, kant kodali <ka...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I am wondering if there a way to persist the lineages generated by
>> spark underneath? Some of our clients want us to prove if the result of the
>> computation that we are showing on a dashboard is correct and for that If
>> we can show the lineage of transformations that are executed to get to the
>> result then that can be the Q.E.D moment but I am not even sure if this is
>> even possible with spark?
>> >
>> > Thanks,
>> > kant
>>
>
>

Re: is there a way to persist the lineages generated by spark?

Posted by kant kodali <ka...@gmail.com>.
yes Lineage that is actually replayable is what is needed for Validation
process. So we can address questions like how a system arrived at a state S
at a time T. I guess a good analogy is event sourcing.


On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke <jo...@gmail.com> wrote:

> I do think this is the right way, you will have to do testing with test
> data verifying that the expected output of the calculation is the output.
> Even if the logical Plan Is correct your calculation might not be. E.g.
> There can be bugs in Spark, in the UI or (what is very often) the client
> describes a calculation, but in the end the description is wrong.
>
> > On 4. Apr 2017, at 05:19, kant kodali <ka...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I am wondering if there a way to persist the lineages generated by spark
> underneath? Some of our clients want us to prove if the result of the
> computation that we are showing on a dashboard is correct and for that If
> we can show the lineage of transformations that are executed to get to the
> result then that can be the Q.E.D moment but I am not even sure if this is
> even possible with spark?
> >
> > Thanks,
> > kant
>

Re: is there a way to persist the lineages generated by spark?

Posted by kant kodali <ka...@gmail.com>.
yes Lineage that is actually replayable is what is needed for Validation
process. So we can address questions like how a system arrived at a state S
at a time T. I guess a good analogy is event sourcing.


On Thu, Apr 6, 2017 at 10:30 PM, Jörn Franke <jo...@gmail.com> wrote:

> I do think this is the right way, you will have to do testing with test
> data verifying that the expected output of the calculation is the output.
> Even if the logical Plan Is correct your calculation might not be. E.g.
> There can be bugs in Spark, in the UI or (what is very often) the client
> describes a calculation, but in the end the description is wrong.
>
> > On 4. Apr 2017, at 05:19, kant kodali <ka...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I am wondering if there a way to persist the lineages generated by spark
> underneath? Some of our clients want us to prove if the result of the
> computation that we are showing on a dashboard is correct and for that If
> we can show the lineage of transformations that are executed to get to the
> result then that can be the Q.E.D moment but I am not even sure if this is
> even possible with spark?
> >
> > Thanks,
> > kant
>

Re: is there a way to persist the lineages generated by spark?

Posted by Jörn Franke <jo...@gmail.com>.
I do think this is the right way, you will have to do testing with test data verifying that the expected output of the calculation is the output. 
Even if the logical Plan Is correct your calculation might not be. E.g. There can be bugs in Spark, in the UI or (what is very often) the client describes a calculation, but in the end the description is wrong.

> On 4. Apr 2017, at 05:19, kant kodali <ka...@gmail.com> wrote:
> 
> Hi All,
> 
> I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to the result then that can be the Q.E.D moment but I am not even sure if this is even possible with spark?
> 
> Thanks,
> kant

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: is there a way to persist the lineages generated by spark?

Posted by Jörn Franke <jo...@gmail.com>.
I do think this is the right way, you will have to do testing with test data verifying that the expected output of the calculation is the output. 
Even if the logical Plan Is correct your calculation might not be. E.g. There can be bugs in Spark, in the UI or (what is very often) the client describes a calculation, but in the end the description is wrong.

> On 4. Apr 2017, at 05:19, kant kodali <ka...@gmail.com> wrote:
> 
> Hi All,
> 
> I am wondering if there a way to persist the lineages generated by spark underneath? Some of our clients want us to prove if the result of the computation that we are showing on a dashboard is correct and for that If we can show the lineage of transformations that are executed to get to the result then that can be the Q.E.D moment but I am not even sure if this is even possible with spark?
> 
> Thanks,
> kant

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org