You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by Ashwin Chandra Putta <as...@gmail.com> on 2016/04/25 09:00:33 UTC

Lineage support on apex

Hi All,

I have heard of a few use cases where lineage support is asked for. On
apex, it seems to be an ask for the ability to uniquely track each tuple as
it flows through the DAG. It further boils down to being able to track
every tuple going into each operator and the corresponding tuple going out
of the operator. Here are a quick list I put together to describe some
requirements for lineage support on apex. Please feel free to improve or
add to it. Also, please respond with ideas on how we can solve this on the
apex platform.

When lineage is enabled,
1. We should be able to track each tuple as it enters and exits an
operator. eg: enrichment.
2. We should be able to track all the tuples that contributed to a tuple
that is emitted. eg: dimensions computation.
3. We should be able to track all the tuples that contributed to all the
tuples emitted by the operator. eg: join?

-- 

Regards,
Ashwin.

Re: Lineage support on apex

Posted by Atri Sharma <at...@gmail.com>.
+1

I like this feature

On Mon, Apr 25, 2016 at 7:52 PM, Amol Kekre <am...@datatorrent.com> wrote:

> This is very valuable. I have heard the following feature sets from
> customers.
>
> - Ability to spool to hdfs (or any DFS interface)
> - Ability to pick and choose the tuple, i.e. not every tuple may need to be
> tracked
> - Minimal performance hit
> - The current api remains as is
> - Ability to get the content based on tuple-id
>
> Apex should enable this with minimal or no coding from users
>
> Thks,
> Amol
>
>
> On Mon, Apr 25, 2016 at 12:00 AM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
> > Hi All,
> >
> > I have heard of a few use cases where lineage support is asked for. On
> > apex, it seems to be an ask for the ability to uniquely track each tuple
> as
> > it flows through the DAG. It further boils down to being able to track
> > every tuple going into each operator and the corresponding tuple going
> out
> > of the operator. Here are a quick list I put together to describe some
> > requirements for lineage support on apex. Please feel free to improve or
> > add to it. Also, please respond with ideas on how we can solve this on
> the
> > apex platform.
> >
> > When lineage is enabled,
> > 1. We should be able to track each tuple as it enters and exits an
> > operator. eg: enrichment.
> > 2. We should be able to track all the tuples that contributed to a tuple
> > that is emitted. eg: dimensions computation.
> > 3. We should be able to track all the tuples that contributed to all the
> > tuples emitted by the operator. eg: join?
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>



-- 
Regards,

Atri
*l'apprenant*

Re: Lineage support on apex

Posted by Atri Sharma <at...@gmail.com>.
+1

I like this feature

On Mon, Apr 25, 2016 at 7:52 PM, Amol Kekre <am...@datatorrent.com> wrote:

> This is very valuable. I have heard the following feature sets from
> customers.
>
> - Ability to spool to hdfs (or any DFS interface)
> - Ability to pick and choose the tuple, i.e. not every tuple may need to be
> tracked
> - Minimal performance hit
> - The current api remains as is
> - Ability to get the content based on tuple-id
>
> Apex should enable this with minimal or no coding from users
>
> Thks,
> Amol
>
>
> On Mon, Apr 25, 2016 at 12:00 AM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
> > Hi All,
> >
> > I have heard of a few use cases where lineage support is asked for. On
> > apex, it seems to be an ask for the ability to uniquely track each tuple
> as
> > it flows through the DAG. It further boils down to being able to track
> > every tuple going into each operator and the corresponding tuple going
> out
> > of the operator. Here are a quick list I put together to describe some
> > requirements for lineage support on apex. Please feel free to improve or
> > add to it. Also, please respond with ideas on how we can solve this on
> the
> > apex platform.
> >
> > When lineage is enabled,
> > 1. We should be able to track each tuple as it enters and exits an
> > operator. eg: enrichment.
> > 2. We should be able to track all the tuples that contributed to a tuple
> > that is emitted. eg: dimensions computation.
> > 3. We should be able to track all the tuples that contributed to all the
> > tuples emitted by the operator. eg: join?
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>



-- 
Regards,

Atri
*l'apprenant*

Re: Lineage support on apex

Posted by Amol Kekre <am...@datatorrent.com>.
This is very valuable. I have heard the following feature sets from
customers.

- Ability to spool to hdfs (or any DFS interface)
- Ability to pick and choose the tuple, i.e. not every tuple may need to be
tracked
- Minimal performance hit
- The current api remains as is
- Ability to get the content based on tuple-id

Apex should enable this with minimal or no coding from users

Thks,
Amol


On Mon, Apr 25, 2016 at 12:00 AM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Hi All,
>
> I have heard of a few use cases where lineage support is asked for. On
> apex, it seems to be an ask for the ability to uniquely track each tuple as
> it flows through the DAG. It further boils down to being able to track
> every tuple going into each operator and the corresponding tuple going out
> of the operator. Here are a quick list I put together to describe some
> requirements for lineage support on apex. Please feel free to improve or
> add to it. Also, please respond with ideas on how we can solve this on the
> apex platform.
>
> When lineage is enabled,
> 1. We should be able to track each tuple as it enters and exits an
> operator. eg: enrichment.
> 2. We should be able to track all the tuples that contributed to a tuple
> that is emitted. eg: dimensions computation.
> 3. We should be able to track all the tuples that contributed to all the
> tuples emitted by the operator. eg: join?
>
> --
>
> Regards,
> Ashwin.
>

Re: Lineage support on apex

Posted by Amol Kekre <am...@datatorrent.com>.
This is very valuable. I have heard the following feature sets from
customers.

- Ability to spool to hdfs (or any DFS interface)
- Ability to pick and choose the tuple, i.e. not every tuple may need to be
tracked
- Minimal performance hit
- The current api remains as is
- Ability to get the content based on tuple-id

Apex should enable this with minimal or no coding from users

Thks,
Amol


On Mon, Apr 25, 2016 at 12:00 AM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Hi All,
>
> I have heard of a few use cases where lineage support is asked for. On
> apex, it seems to be an ask for the ability to uniquely track each tuple as
> it flows through the DAG. It further boils down to being able to track
> every tuple going into each operator and the corresponding tuple going out
> of the operator. Here are a quick list I put together to describe some
> requirements for lineage support on apex. Please feel free to improve or
> add to it. Also, please respond with ideas on how we can solve this on the
> apex platform.
>
> When lineage is enabled,
> 1. We should be able to track each tuple as it enters and exits an
> operator. eg: enrichment.
> 2. We should be able to track all the tuples that contributed to a tuple
> that is emitted. eg: dimensions computation.
> 3. We should be able to track all the tuples that contributed to all the
> tuples emitted by the operator. eg: join?
>
> --
>
> Regards,
> Ashwin.
>