You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 开心延年 <mu...@qq.com> on 2016/01/28 13:28:42 UTC

回复：Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

we always used Sql like below.

select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10

Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t use the index (Something like HbaseStoreHandle).








------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上7:27
收件人: "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>; 

主题: Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?



Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .


for example  I make a custorm StorageHandler like hive .

creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler' 
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)

in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR  like this

  String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
    if (filterExprSerialized == null) {
        return "";
//         throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
    }else{
        LOG.info(filterExprSerialized);
        ExprNodeGenericFuncDesc filterExpr =    Utilities.deserializeExpression(filterExprSerialized);
        LOG.info(filterExpr);
        try {
            return Ya100Utils.parserFilter(filterExpr,info);
        } catch (Throwable e) {
            throw new IOException(e);
        }
    }

回复：Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

Posted by 开心延年 <mu...@qq.com>.

If we support TableScanDesc.FILTER_EXPR_CONF_STR like hive 

we may write sql LIKE this 

select ydb_sex from ydb_example_shu where ydbpartion='20151110' limit 10
select ydb_sex from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10


If we may not  support TableScanDesc.FILTER_EXPR_CONF_STR like hive  we write Sql like this  

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select ydb_sex from ydb_example_shu  limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select ydb_sex from ydb_example_shu  limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select count(*) from ydb_example_shu limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex in ('男','女','张三','李四'));
select ydb_sex,ydb_province from ydb_example_shu   limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select count(*) from ydb_example_shu   limit 10



------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上8:28
收件人: "开心延年"<mu...@qq.com>; "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>; 

主题: 回复：Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?



we always used Sql like below.

select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10

Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t use the index (Something like HbaseStoreHandle).








------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上7:27
收件人: "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>; 

主题: Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?



Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .


for example  I make a custorm StorageHandler like hive .

creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler' 
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)

in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR  like this

  String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
    if (filterExprSerialized == null) {
        return "";
//         throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
    }else{
        LOG.info(filterExprSerialized);
        ExprNodeGenericFuncDesc filterExpr =    Utilities.deserializeExpression(filterExprSerialized);
        LOG.info(filterExpr);
        try {
            return Ya100Utils.parserFilter(filterExpr,info);
        } catch (Throwable e) {
            throw new IOException(e);
        }
    }

回复：Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?

Posted by 开心延年 <mu...@qq.com>.

If we support TableScanDesc.FILTER_EXPR_CONF_STR like hive 

we may write sql LIKE this 

select ydb_sex from ydb_example_shu where ydbpartion='20151110' limit 10
select ydb_sex from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10


If we may not  support TableScanDesc.FILTER_EXPR_CONF_STR like hive  we write Sql like this  

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select ydb_sex from ydb_example_shu  limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select ydb_sex from ydb_example_shu  limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select count(*) from ydb_example_shu limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex in ('男','女','张三','李四'));
select ydb_sex,ydb_province from ydb_example_shu   limit 10

set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select count(*) from ydb_example_shu   limit 10



------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上8:28
收件人: "开心延年"<mu...@qq.com>; "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>; 

主题: 回复：Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?



we always used Sql like below.

select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10

Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t use the index (Something like HbaseStoreHandle).








------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上7:27
收件人: "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>; 

主题: Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?



Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .


for example  I make a custorm StorageHandler like hive .

creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler' 
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)

in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR  like this

  String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
    if (filterExprSerialized == null) {
        return "";
//         throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
    }else{
        LOG.info(filterExprSerialized);
        ExprNodeGenericFuncDesc filterExpr =    Utilities.deserializeExpression(filterExprSerialized);
        LOG.info(filterExpr);
        try {
            return Ya100Utils.parserFilter(filterExpr,info);
        } catch (Throwable e) {
            throw new IOException(e);
        }
    }