You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@trafodion.apache.org by "Liu, Ming (Ming)" <mi...@esgyn.cn> on 2017/10/21 01:54:06 UTC

MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I know Trafodion can use MDAM to reduce scanned rows. For probe, there are two ways: sparse and dense. If the column is INT, dense probe is just +1 operation, if the column is VARCHAR or FLOAT, how to do dense probe? Or we should say dense probe is only for INT data type?

thanks,
Ming

RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Posted by Dave Birdsall <da...@esgyn.com>.

Hi Eric,

As I explained, the compiler does use probes (that is, sparse access) for character types in all cases.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com] 
Sent: Monday, October 23, 2017 9:49 AM
To: user@trafodion.incubator.apache.org; dev@trafodion.incubator.apache.org
Subject: RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi Dave and Ming,
I am not sure I understand why we are not just simply using a probe asking for strictly greater  in all cases>?
Let say after "ABC", the next probe is where x > "ABC" limit 1?
Eric

From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, October 23, 2017 11:35 AM
To: user@trafodion.incubator.apache.org; dev@trafodion.incubator.apache.org
Subject: RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I believe the compiler today only uses "dense" for integer data types. It might do it for intervals too; I am not sure.

Using "dense" depends on being able to add 1. In principle this is possible for any data type (even float!). But it can be tricky in practice. For example, for VARCHARs: Suppose you have a VARCHAR(40) data type, and you have the string "ABC". What is the next string after that? Because of blank-padding semantics, "ABC" = "ABC<37 blanks>". So, logically speaking, you pad out "ABC" to a 40-character string, then you can add one to the last character. And you are off to the races from there.

Now, how many possible strings exist between "ABC" and "ABD"? Well, there are 37 characters after that (assuming ISO 8859/1), each with bit patterns ranging from hex zero to hex 'ff', that is 256 each. So there are 256^37 = 2^(8*37) = 2^296 which is a bit more than 10^89  which is an enormous number. At a nano-second a value, that would take far longer than the age of the universe to traverse.

Dense makes sense when the UEC of an interval of values is very near to the maximum possible number of values in the interval for its data type.

The performance tradeoff in using sparse vs. dense can be understood by looking at the recursive nature of the MDAM algorithm:

Suppose I have a two-column key, A and B, both INTEGER data types.

Suppose I have a query, SELECT * FROM T WHERE A > 10 and A < 100 and B = 12;

Suppose we have chosen an MDAM plan that traverses both A and B.

If we choose dense access on A, then at run time, we will do the following direct accesses: [11,12], [12,12], [13,12], .... [99,12]. That's 89 accesses.

If we choose sparse access on A, then at run time, we will do a probe to find the first value of A that exists after 10. Suppose that is 20. Then we will do a direct access on [20,12]. Then we will do a probe to find the first value of A that exists after 20. Suppose that is 92. Then we will do a direct access on [92,12]. Then we will do a probe to find the first value of A after 92 but before 100. Suppose there are no more. How many accesses did we do? Just 5. Three of those were probes, and two were direct accesses.

In that example, sparse is much better.

But suppose all of the values of A between 10 and 100 exist in the database at run time. If we choose sparse, then we'd to a probe to find the first value, which is 11. Then do direct access on [11,12]. Then do a probe to find the first value after 11. That's 12. Then do a direct access on 12. And so on. We'd do 178 accessses, 89 of which are probes and 89 of which are direct accesses. So in that case, doing dense access is better.

The predecessor product had a notion of "adaptive dense", where we'd start out in dense mode, and if we didn't find any values after two or three iterations, we'd switch to sparse mode. Then switch back once we found a value. I am not sure if Trafodion has the "adaptive dense" implementation.

Dave

From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Friday, October 20, 2017 6:54 PM
To: dev@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>; user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I know Trafodion can use MDAM to reduce scanned rows. For probe, there are two ways: sparse and dense. If the column is INT, dense probe is just +1 operation, if the column is VARCHAR or FLOAT, how to do dense probe? Or we should say dense probe is only for INT data type?

thanks,
Ming

RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Posted by Dave Birdsall <da...@esgyn.com>.

Hi Eric,

As I explained, the compiler does use probes (that is, sparse access) for character types in all cases.

Dave

-----Original Message-----
From: Eric Owhadi [mailto:eric.owhadi@esgyn.com] 
Sent: Monday, October 23, 2017 9:49 AM
To: user@trafodion.incubator.apache.org; dev@trafodion.incubator.apache.org
Subject: RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi Dave and Ming,
I am not sure I understand why we are not just simply using a probe asking for strictly greater  in all cases>?
Let say after "ABC", the next probe is where x > "ABC" limit 1?
Eric

From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, October 23, 2017 11:35 AM
To: user@trafodion.incubator.apache.org; dev@trafodion.incubator.apache.org
Subject: RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I believe the compiler today only uses "dense" for integer data types. It might do it for intervals too; I am not sure.

Using "dense" depends on being able to add 1. In principle this is possible for any data type (even float!). But it can be tricky in practice. For example, for VARCHARs: Suppose you have a VARCHAR(40) data type, and you have the string "ABC". What is the next string after that? Because of blank-padding semantics, "ABC" = "ABC<37 blanks>". So, logically speaking, you pad out "ABC" to a 40-character string, then you can add one to the last character. And you are off to the races from there.

Now, how many possible strings exist between "ABC" and "ABD"? Well, there are 37 characters after that (assuming ISO 8859/1), each with bit patterns ranging from hex zero to hex 'ff', that is 256 each. So there are 256^37 = 2^(8*37) = 2^296 which is a bit more than 10^89  which is an enormous number. At a nano-second a value, that would take far longer than the age of the universe to traverse.

Dense makes sense when the UEC of an interval of values is very near to the maximum possible number of values in the interval for its data type.

The performance tradeoff in using sparse vs. dense can be understood by looking at the recursive nature of the MDAM algorithm:

Suppose I have a two-column key, A and B, both INTEGER data types.

Suppose I have a query, SELECT * FROM T WHERE A > 10 and A < 100 and B = 12;

Suppose we have chosen an MDAM plan that traverses both A and B.

If we choose dense access on A, then at run time, we will do the following direct accesses: [11,12], [12,12], [13,12], .... [99,12]. That's 89 accesses.

If we choose sparse access on A, then at run time, we will do a probe to find the first value of A that exists after 10. Suppose that is 20. Then we will do a direct access on [20,12]. Then we will do a probe to find the first value of A that exists after 20. Suppose that is 92. Then we will do a direct access on [92,12]. Then we will do a probe to find the first value of A after 92 but before 100. Suppose there are no more. How many accesses did we do? Just 5. Three of those were probes, and two were direct accesses.

In that example, sparse is much better.

But suppose all of the values of A between 10 and 100 exist in the database at run time. If we choose sparse, then we'd to a probe to find the first value, which is 11. Then do direct access on [11,12]. Then do a probe to find the first value after 11. That's 12. Then do a direct access on 12. And so on. We'd do 178 accessses, 89 of which are probes and 89 of which are direct accesses. So in that case, doing dense access is better.

The predecessor product had a notion of "adaptive dense", where we'd start out in dense mode, and if we didn't find any values after two or three iterations, we'd switch to sparse mode. Then switch back once we found a value. I am not sure if Trafodion has the "adaptive dense" implementation.

Dave

From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Friday, October 20, 2017 6:54 PM
To: dev@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>; user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I know Trafodion can use MDAM to reduce scanned rows. For probe, there are two ways: sparse and dense. If the column is INT, dense probe is just +1 operation, if the column is VARCHAR or FLOAT, how to do dense probe? Or we should say dense probe is only for INT data type?

thanks,
Ming

RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Posted by Eric Owhadi <er...@esgyn.com>.

Hi Dave and Ming,
I am not sure I understand why we are not just simply using a probe asking for strictly greater  in all cases>?
Let say after "ABC", the next probe is where x > "ABC" limit 1?
Eric

From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, October 23, 2017 11:35 AM
To: user@trafodion.incubator.apache.org; dev@trafodion.incubator.apache.org
Subject: RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I believe the compiler today only uses "dense" for integer data types. It might do it for intervals too; I am not sure.

Using "dense" depends on being able to add 1. In principle this is possible for any data type (even float!). But it can be tricky in practice. For example, for VARCHARs: Suppose you have a VARCHAR(40) data type, and you have the string "ABC". What is the next string after that? Because of blank-padding semantics, "ABC" = "ABC<37 blanks>". So, logically speaking, you pad out "ABC" to a 40-character string, then you can add one to the last character. And you are off to the races from there.

Now, how many possible strings exist between "ABC" and "ABD"? Well, there are 37 characters after that (assuming ISO 8859/1), each with bit patterns ranging from hex zero to hex 'ff', that is 256 each. So there are 256^37 = 2^(8*37) = 2^296 which is a bit more than 10^89  which is an enormous number. At a nano-second a value, that would take far longer than the age of the universe to traverse.

Dense makes sense when the UEC of an interval of values is very near to the maximum possible number of values in the interval for its data type.

The performance tradeoff in using sparse vs. dense can be understood by looking at the recursive nature of the MDAM algorithm:

Suppose I have a two-column key, A and B, both INTEGER data types.

Suppose I have a query, SELECT * FROM T WHERE A > 10 and A < 100 and B = 12;

Suppose we have chosen an MDAM plan that traverses both A and B.

If we choose dense access on A, then at run time, we will do the following direct accesses: [11,12], [12,12], [13,12], .... [99,12]. That's 89 accesses.

If we choose sparse access on A, then at run time, we will do a probe to find the first value of A that exists after 10. Suppose that is 20. Then we will do a direct access on [20,12]. Then we will do a probe to find the first value of A that exists after 20. Suppose that is 92. Then we will do a direct access on [92,12]. Then we will do a probe to find the first value of A after 92 but before 100. Suppose there are no more. How many accesses did we do? Just 5. Three of those were probes, and two were direct accesses.

In that example, sparse is much better.

But suppose all of the values of A between 10 and 100 exist in the database at run time. If we choose sparse, then we'd to a probe to find the first value, which is 11. Then do direct access on [11,12]. Then do a probe to find the first value after 11. That's 12. Then do a direct access on 12. And so on. We'd do 178 accessses, 89 of which are probes and 89 of which are direct accesses. So in that case, doing dense access is better.

The predecessor product had a notion of "adaptive dense", where we'd start out in dense mode, and if we didn't find any values after two or three iterations, we'd switch to sparse mode. Then switch back once we found a value. I am not sure if Trafodion has the "adaptive dense" implementation.

Dave

From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Friday, October 20, 2017 6:54 PM
To: dev@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>; user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I know Trafodion can use MDAM to reduce scanned rows. For probe, there are two ways: sparse and dense. If the column is INT, dense probe is just +1 operation, if the column is VARCHAR or FLOAT, how to do dense probe? Or we should say dense probe is only for INT data type?

thanks,
Ming

RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Posted by Eric Owhadi <er...@esgyn.com>.

Hi Dave and Ming,
I am not sure I understand why we are not just simply using a probe asking for strictly greater  in all cases>?
Let say after "ABC", the next probe is where x > "ABC" limit 1?
Eric

From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Monday, October 23, 2017 11:35 AM
To: user@trafodion.incubator.apache.org; dev@trafodion.incubator.apache.org
Subject: RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I believe the compiler today only uses "dense" for integer data types. It might do it for intervals too; I am not sure.

Using "dense" depends on being able to add 1. In principle this is possible for any data type (even float!). But it can be tricky in practice. For example, for VARCHARs: Suppose you have a VARCHAR(40) data type, and you have the string "ABC". What is the next string after that? Because of blank-padding semantics, "ABC" = "ABC<37 blanks>". So, logically speaking, you pad out "ABC" to a 40-character string, then you can add one to the last character. And you are off to the races from there.

Now, how many possible strings exist between "ABC" and "ABD"? Well, there are 37 characters after that (assuming ISO 8859/1), each with bit patterns ranging from hex zero to hex 'ff', that is 256 each. So there are 256^37 = 2^(8*37) = 2^296 which is a bit more than 10^89  which is an enormous number. At a nano-second a value, that would take far longer than the age of the universe to traverse.

Dense makes sense when the UEC of an interval of values is very near to the maximum possible number of values in the interval for its data type.

The performance tradeoff in using sparse vs. dense can be understood by looking at the recursive nature of the MDAM algorithm:

Suppose I have a two-column key, A and B, both INTEGER data types.

Suppose I have a query, SELECT * FROM T WHERE A > 10 and A < 100 and B = 12;

Suppose we have chosen an MDAM plan that traverses both A and B.

If we choose dense access on A, then at run time, we will do the following direct accesses: [11,12], [12,12], [13,12], .... [99,12]. That's 89 accesses.

If we choose sparse access on A, then at run time, we will do a probe to find the first value of A that exists after 10. Suppose that is 20. Then we will do a direct access on [20,12]. Then we will do a probe to find the first value of A that exists after 20. Suppose that is 92. Then we will do a direct access on [92,12]. Then we will do a probe to find the first value of A after 92 but before 100. Suppose there are no more. How many accesses did we do? Just 5. Three of those were probes, and two were direct accesses.

In that example, sparse is much better.

But suppose all of the values of A between 10 and 100 exist in the database at run time. If we choose sparse, then we'd to a probe to find the first value, which is 11. Then do direct access on [11,12]. Then do a probe to find the first value after 11. That's 12. Then do a direct access on 12. And so on. We'd do 178 accessses, 89 of which are probes and 89 of which are direct accesses. So in that case, doing dense access is better.

The predecessor product had a notion of "adaptive dense", where we'd start out in dense mode, and if we didn't find any values after two or three iterations, we'd switch to sparse mode. Then switch back once we found a value. I am not sure if Trafodion has the "adaptive dense" implementation.

Dave

From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Friday, October 20, 2017 6:54 PM
To: dev@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>; user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I know Trafodion can use MDAM to reduce scanned rows. For probe, there are two ways: sparse and dense. If the column is INT, dense probe is just +1 operation, if the column is VARCHAR or FLOAT, how to do dense probe? Or we should say dense probe is only for INT data type?

thanks,
Ming

RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Posted by Dave Birdsall <da...@esgyn.com>.

Hi,

I believe the compiler today only uses "dense" for integer data types. It might do it for intervals too; I am not sure.

Using "dense" depends on being able to add 1. In principle this is possible for any data type (even float!). But it can be tricky in practice. For example, for VARCHARs: Suppose you have a VARCHAR(40) data type, and you have the string "ABC". What is the next string after that? Because of blank-padding semantics, "ABC" = "ABC<37 blanks>". So, logically speaking, you pad out "ABC" to a 40-character string, then you can add one to the last character. And you are off to the races from there.

Now, how many possible strings exist between "ABC" and "ABD"? Well, there are 37 characters after that (assuming ISO 8859/1), each with bit patterns ranging from hex zero to hex 'ff', that is 256 each. So there are 256^37 = 2^(8*37) = 2^296 which is a bit more than 10^89  which is an enormous number. At a nano-second a value, that would take far longer than the age of the universe to traverse.

Dense makes sense when the UEC of an interval of values is very near to the maximum possible number of values in the interval for its data type.

The performance tradeoff in using sparse vs. dense can be understood by looking at the recursive nature of the MDAM algorithm:

Suppose I have a two-column key, A and B, both INTEGER data types.

Suppose I have a query, SELECT * FROM T WHERE A > 10 and A < 100 and B = 12;

Suppose we have chosen an MDAM plan that traverses both A and B.

If we choose dense access on A, then at run time, we will do the following direct accesses: [11,12], [12,12], [13,12], .... [99,12]. That's 89 accesses.

If we choose sparse access on A, then at run time, we will do a probe to find the first value of A that exists after 10. Suppose that is 20. Then we will do a direct access on [20,12]. Then we will do a probe to find the first value of A that exists after 20. Suppose that is 92. Then we will do a direct access on [92,12]. Then we will do a probe to find the first value of A after 92 but before 100. Suppose there are no more. How many accesses did we do? Just 5. Three of those were probes, and two were direct accesses.

In that example, sparse is much better.

But suppose all of the values of A between 10 and 100 exist in the database at run time. If we choose sparse, then we'd to a probe to find the first value, which is 11. Then do direct access on [11,12]. Then do a probe to find the first value after 11. That's 12. Then do a direct access on 12. And so on. We'd do 178 accessses, 89 of which are probes and 89 of which are direct accesses. So in that case, doing dense access is better.

The predecessor product had a notion of "adaptive dense", where we'd start out in dense mode, and if we didn't find any values after two or three iterations, we'd switch to sparse mode. Then switch back once we found a value. I am not sure if Trafodion has the "adaptive dense" implementation.

Dave


From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Friday, October 20, 2017 6:54 PM
To: dev@trafodion.incubator.apache.org; user@trafodion.incubator.apache.org
Subject: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I know Trafodion can use MDAM to reduce scanned rows. For probe, there are two ways: sparse and dense. If the column is INT, dense probe is just +1 operation, if the column is VARCHAR or FLOAT, how to do dense probe? Or we should say dense probe is only for INT data type?

thanks,
Ming

RE: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Posted by Dave Birdsall <da...@esgyn.com>.

Hi,

I believe the compiler today only uses "dense" for integer data types. It might do it for intervals too; I am not sure.

Using "dense" depends on being able to add 1. In principle this is possible for any data type (even float!). But it can be tricky in practice. For example, for VARCHARs: Suppose you have a VARCHAR(40) data type, and you have the string "ABC". What is the next string after that? Because of blank-padding semantics, "ABC" = "ABC<37 blanks>". So, logically speaking, you pad out "ABC" to a 40-character string, then you can add one to the last character. And you are off to the races from there.

Now, how many possible strings exist between "ABC" and "ABD"? Well, there are 37 characters after that (assuming ISO 8859/1), each with bit patterns ranging from hex zero to hex 'ff', that is 256 each. So there are 256^37 = 2^(8*37) = 2^296 which is a bit more than 10^89  which is an enormous number. At a nano-second a value, that would take far longer than the age of the universe to traverse.

Dense makes sense when the UEC of an interval of values is very near to the maximum possible number of values in the interval for its data type.

The performance tradeoff in using sparse vs. dense can be understood by looking at the recursive nature of the MDAM algorithm:

Suppose I have a two-column key, A and B, both INTEGER data types.

Suppose I have a query, SELECT * FROM T WHERE A > 10 and A < 100 and B = 12;

Suppose we have chosen an MDAM plan that traverses both A and B.

If we choose dense access on A, then at run time, we will do the following direct accesses: [11,12], [12,12], [13,12], .... [99,12]. That's 89 accesses.

If we choose sparse access on A, then at run time, we will do a probe to find the first value of A that exists after 10. Suppose that is 20. Then we will do a direct access on [20,12]. Then we will do a probe to find the first value of A that exists after 20. Suppose that is 92. Then we will do a direct access on [92,12]. Then we will do a probe to find the first value of A after 92 but before 100. Suppose there are no more. How many accesses did we do? Just 5. Three of those were probes, and two were direct accesses.

In that example, sparse is much better.

But suppose all of the values of A between 10 and 100 exist in the database at run time. If we choose sparse, then we'd to a probe to find the first value, which is 11. Then do direct access on [11,12]. Then do a probe to find the first value after 11. That's 12. Then do a direct access on 12. And so on. We'd do 178 accessses, 89 of which are probes and 89 of which are direct accesses. So in that case, doing dense access is better.

The predecessor product had a notion of "adaptive dense", where we'd start out in dense mode, and if we didn't find any values after two or three iterations, we'd switch to sparse mode. Then switch back once we found a value. I am not sure if Trafodion has the "adaptive dense" implementation.

Dave


From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Friday, October 20, 2017 6:54 PM
To: dev@trafodion.incubator.apache.org; user@trafodion.incubator.apache.org
Subject: MDAM has sparse and dense mode, for string column, is it possible to use dense mode?

Hi,

I know Trafodion can use MDAM to reduce scanned rows. For probe, there are two ways: sparse and dense. If the column is INT, dense probe is just +1 operation, if the column is VARCHAR or FLOAT, how to do dense probe? Or we should say dense probe is only for INT data type?

thanks,
Ming