You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by "Liu, Ming (Ming)" <mi...@esgyn.cn> on 2016/03/23 03:38:24 UTC

答复: investigating parallel scanner

Just a hint.
IMHO, order is one of the physical properties, there is a sort-related flag in PhysicalProperty, and it is optimizer who should consider this flag. 
There are functions like FileScan:: synthHiveScanPhysicalProperty or FileScan:: synthHbaseScanPhysicalProperty, not sure if related.  

Thanks,
Ming
-----邮件原件-----
发件人: Eric Owhadi [mailto:eric.owhadi@esgyn.com] 
发送时间: 2016年3月23日 9:36
收件人: dev@trafodion.incubator.apache.org
主题: RE: investigating parallel scanner

That's indeed what I am looking for. And this is not specific to mdam. Any scan... need to know if order matters or not...
eric

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, March 22, 2016 6:26 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

Hi Eric,

Yes, that is true.

The Optimizer sometimes takes advantage of the MDAM ordering (that is, an operator that consumes the rows produced by an MDAM scan may rely on the order). So you'd need a flag in the scan node to know if this is true.

Actually, we may want to change the costing code in the Optimizer to take into account the tradeoff of order + serial behavior vs. unordered + higher parallelism.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, March 22, 2016 4:13 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

MDAM ordering is important for the sequencing of probes.
The resulting scan are not feeding the "next probe", so in theory, if the probes where generating a list of scans to serve, this list could be processed in random order, in parallel, if the parent node is not expecting to receive data in order.
Anyway, MDAM scan optimization would have been a different beast, for now I want to do a V1 that is not dealing with MDAM.
Eric

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, March 22, 2016 6:08 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

Hi,

I don't know about a flag.

MDAM, however, does assume that it is scanning things in order, so if MDAM is present that's a clue.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, March 22, 2016 3:41 PM
To: dev@trafodion.incubator.apache.org
Subject: investigating parallel scanner

Hello Trafodioneers,

In order to implement a parallel scanner, that would scan regions of a table in parallel, but consequently will not return the rows in order, I was hunting for a flag somewhere, I would have guessed in ComTdbHbaseAccess, that would tell me if the return order is important or not on a scan operator, so that I know if parallel scanner is a candidate, or if it is forbidden.

But so far, have not been able to locate such a flag.

Any idea if it exist, and if yes where is it?

Thanks in advance for the help,

Eric

RE: investigating parallel scanner

Posted by Selva Govindarajan <se...@esgyn.com>.
My understanding is that we haven't capitalized on the hbase feature of
returning rows in order in Trafodion optimizer. So, if the tuple needs to
flow in sorted order, it would add sort operator.

Selva

-----Original Message-----
From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Tuesday, March 22, 2016 7:46 PM
To: dev@trafodion.incubator.apache.org
Subject: 答复: investigating parallel scanner

Hi again, as per my understanding, an operator just need to set the property
to tell optimizer if it can return data in order or cannot. It is optimizer
who make the decision which plan to choose. So there should not be a flag in
the TDB and let the scan operator to check and decide if it do scan in order
or not.

Just my 2 cent.

-----邮件原件-----
发件人: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
发送时间: 2016年3月23日 10:38
收件人: dev@trafodion.incubator.apache.org
主题: 答复: investigating parallel scanner

Just a hint.
IMHO, order is one of the physical properties, there is a sort-related flag
in PhysicalProperty, and it is optimizer who should consider this flag.
There are functions like FileScan:: synthHiveScanPhysicalProperty or
FileScan:: synthHbaseScanPhysicalProperty, not sure if related.

Thanks,
Ming
-----邮件原件-----
发件人: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
发送时间: 2016年3月23日 9:36
收件人: dev@trafodion.incubator.apache.org
主题: RE: investigating parallel scanner

That's indeed what I am looking for. And this is not specific to mdam. Any
scan... need to know if order matters or not...
eric

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, March 22, 2016 6:26 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

Hi Eric,

Yes, that is true.

The Optimizer sometimes takes advantage of the MDAM ordering (that is, an
operator that consumes the rows produced by an MDAM scan may rely on the
order). So you'd need a flag in the scan node to know if this is true.

Actually, we may want to change the costing code in the Optimizer to take
into account the tradeoff of order + serial behavior vs. unordered + higher
parallelism.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, March 22, 2016 4:13 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

MDAM ordering is important for the sequencing of probes.
The resulting scan are not feeding the "next probe", so in theory, if the
probes where generating a list of scans to serve, this list could be
processed in random order, in parallel, if the parent node is not expecting
to receive data in order.
Anyway, MDAM scan optimization would have been a different beast, for now I
want to do a V1 that is not dealing with MDAM.
Eric

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, March 22, 2016 6:08 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

Hi,

I don't know about a flag.

MDAM, however, does assume that it is scanning things in order, so if MDAM
is present that's a clue.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, March 22, 2016 3:41 PM
To: dev@trafodion.incubator.apache.org
Subject: investigating parallel scanner

Hello Trafodioneers,

In order to implement a parallel scanner, that would scan regions of a table
in parallel, but consequently will not return the rows in order, I was
hunting for a flag somewhere, I would have guessed in ComTdbHbaseAccess,
that would tell me if the return order is important or not on a scan
operator, so that I know if parallel scanner is a candidate, or if it is
forbidden.

But so far, have not been able to locate such a flag.

Any idea if it exist, and if yes where is it?

Thanks in advance for the help,

Eric

答复: investigating parallel scanner

Posted by "Liu, Ming (Ming)" <mi...@esgyn.cn>.
Hi again, as per my understanding, an operator just need to set the property to tell optimizer if it can return data in order or cannot. It is optimizer who make the decision which plan to choose. So there should not be a flag in the TDB and let the scan operator to check and decide if it do scan in order or not.

Just my 2 cent. 

-----邮件原件-----
发件人: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn] 
发送时间: 2016年3月23日 10:38
收件人: dev@trafodion.incubator.apache.org
主题: 答复: investigating parallel scanner

Just a hint.
IMHO, order is one of the physical properties, there is a sort-related flag in PhysicalProperty, and it is optimizer who should consider this flag. 
There are functions like FileScan:: synthHiveScanPhysicalProperty or FileScan:: synthHbaseScanPhysicalProperty, not sure if related.  

Thanks,
Ming
-----邮件原件-----
发件人: Eric Owhadi [mailto:eric.owhadi@esgyn.com] 
发送时间: 2016年3月23日 9:36
收件人: dev@trafodion.incubator.apache.org
主题: RE: investigating parallel scanner

That's indeed what I am looking for. And this is not specific to mdam. Any scan... need to know if order matters or not...
eric

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, March 22, 2016 6:26 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

Hi Eric,

Yes, that is true.

The Optimizer sometimes takes advantage of the MDAM ordering (that is, an operator that consumes the rows produced by an MDAM scan may rely on the order). So you'd need a flag in the scan node to know if this is true.

Actually, we may want to change the costing code in the Optimizer to take into account the tradeoff of order + serial behavior vs. unordered + higher parallelism.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, March 22, 2016 4:13 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

MDAM ordering is important for the sequencing of probes.
The resulting scan are not feeding the "next probe", so in theory, if the probes where generating a list of scans to serve, this list could be processed in random order, in parallel, if the parent node is not expecting to receive data in order.
Anyway, MDAM scan optimization would have been a different beast, for now I want to do a V1 that is not dealing with MDAM.
Eric

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, March 22, 2016 6:08 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: investigating parallel scanner

Hi,

I don't know about a flag.

MDAM, however, does assume that it is scanning things in order, so if MDAM is present that's a clue.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com]
Sent: Tuesday, March 22, 2016 3:41 PM
To: dev@trafodion.incubator.apache.org
Subject: investigating parallel scanner

Hello Trafodioneers,

In order to implement a parallel scanner, that would scan regions of a table in parallel, but consequently will not return the rows in order, I was hunting for a flag somewhere, I would have guessed in ComTdbHbaseAccess, that would tell me if the return order is important or not on a scan operator, so that I know if parallel scanner is a candidate, or if it is forbidden.

But so far, have not been able to locate such a flag.

Any idea if it exist, and if yes where is it?

Thanks in advance for the help,

Eric