You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 开心延年 <mu...@qq.com> on 2016/01/28 12:27:19 UTC
Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .
for example I make a custorm StorageHandler like hive .
creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler'
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)
in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR like this
String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
if (filterExprSerialized == null) {
return "";
// throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
}else{
LOG.info(filterExprSerialized);
ExprNodeGenericFuncDesc filterExpr = Utilities.deserializeExpression(filterExprSerialized);
LOG.info(filterExpr);
try {
return Ya100Utils.parserFilter(filterExpr,info);
} catch (Throwable e) {
throw new IOException(e);
}
}
回复:Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Posted by 开心延年 <mu...@qq.com>.
If we support TableScanDesc.FILTER_EXPR_CONF_STR like hive
we may write sql LIKE this
select ydb_sex from ydb_example_shu where ydbpartion='20151110' limit 10
select ydb_sex from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10
If we may not support TableScanDesc.FILTER_EXPR_CONF_STR like hive we write Sql like this
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select ydb_sex from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select ydb_sex from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select count(*) from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex in ('男','女','张三','李四'));
select ydb_sex,ydb_province from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select count(*) from ydb_example_shu limit 10
------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上8:28
收件人: "开心延年"<mu...@qq.com>; "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>;
主题: 回复:Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
we always used Sql like below.
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10
Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t use the index (Something like HbaseStoreHandle).
------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上7:27
收件人: "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>;
主题: Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .
for example I make a custorm StorageHandler like hive .
creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler'
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)
in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR like this
String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
if (filterExprSerialized == null) {
return "";
// throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
}else{
LOG.info(filterExprSerialized);
ExprNodeGenericFuncDesc filterExpr = Utilities.deserializeExpression(filterExprSerialized);
LOG.info(filterExpr);
try {
return Ya100Utils.parserFilter(filterExpr,info);
} catch (Throwable e) {
throw new IOException(e);
}
}
回复:Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Posted by 开心延年 <mu...@qq.com>.
If we support TableScanDesc.FILTER_EXPR_CONF_STR like hive
we may write sql LIKE this
select ydb_sex from ydb_example_shu where ydbpartion='20151110' limit 10
select ydb_sex from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217') limit 10
If we may not support TableScanDesc.FILTER_EXPR_CONF_STR like hive we write Sql like this
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select ydb_sex from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select ydb_sex from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex='女' or ydb_province='辽宁' or ydb_day>='20151217');
select count(*) from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110' and (ydb_sex in ('男','女','张三','李四'));
select ydb_sex,ydb_province from ydb_example_shu limit 10
set ya100.spark.filter.ydb_example_shu=ydbpartion='20151110';
select count(*) from ydb_example_shu limit 10
------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上8:28
收件人: "开心延年"<mu...@qq.com>; "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>;
主题: 回复:Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
we always used Sql like below.
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10
Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t use the index (Something like HbaseStoreHandle).
------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上7:27
收件人: "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>;
主题: Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .
for example I make a custorm StorageHandler like hive .
creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler'
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)
in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR like this
String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
if (filterExprSerialized == null) {
return "";
// throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
}else{
LOG.info(filterExprSerialized);
ExprNodeGenericFuncDesc filterExpr = Utilities.deserializeExpression(filterExprSerialized);
LOG.info(filterExpr);
try {
return Ya100Utils.parserFilter(filterExpr,info);
} catch (Throwable e) {
throw new IOException(e);
}
}
回复:Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Posted by 开心延年 <mu...@qq.com>.
we always used Sql like below.
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10
Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t use the index (Something like HbaseStoreHandle).
------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上7:27
收件人: "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>;
主题: Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .
for example I make a custorm StorageHandler like hive .
creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler'
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)
in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR like this
String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
if (filterExprSerialized == null) {
return "";
// throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
}else{
LOG.info(filterExprSerialized);
ExprNodeGenericFuncDesc filterExpr = Utilities.deserializeExpression(filterExprSerialized);
LOG.info(filterExpr);
try {
return Ya100Utils.parserFilter(filterExpr,info);
} catch (Throwable e) {
throw new IOException(e);
}
}
回复:Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Posted by 开心延年 <mu...@qq.com>.
we always used Sql like below.
select count(*) from ydb_example_shu where ydbpartion='20151110' and (ydb_sex='' or ydb_province='LIAONING' or ydb_day>='20151217') limit 10
Spark don't push down predicates for TableScanDesc.FILTER_EXPR_CONF_STR, which means that every query is full scan can`t use the index (Something like HbaseStoreHandle).
------------------ 原始邮件 ------------------
发件人: "开心延年";<mu...@qq.com>;
发送时间: 2016年1月28日(星期四) 晚上7:27
收件人: "user"<us...@spark.apache.org>; "dev"<de...@spark.apache.org>;
主题: Why Spark-sql miss TableScanDesc.FILTER_EXPR_CONF_STR params when I move Hive table to Spark?
Dear spark
I am test StorageHandler on Spark-SQL.
but i find the TableScanDesc.FILTER_EXPR_CONF_STR is miss ,but i need it ,is three any where i could found it?
I really want to get some filter information from Spark Sql, so that I could make a pre filter by my Index ;
so where is the TableScanDesc.FILTER_EXPR_CONF_STR=hive.io.filter.expr.serialized? it is missing or replace by other method ,thanks every body ,thanks .
for example I make a custorm StorageHandler like hive .
creat table xxx(...)
STORED BY 'cn.net.ycloud.ydb.handle.Ya100StorageHandler'
TBLPROPERTIES(
"ya100.handler.master"="101.200.130.48:8080",
"ya100.handler.table.name"="ydb_example_shu",
"ya100.handler.columns.mapping"="phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,content,ydbpartion,ya100_pipe"
)
in Ya100StorageHandler code .
I wang to use TableScanDesc.FILTER_EXPR_CONF_STR like this
String filterExprSerialized = conf.get(TableScanDesc.FILTER_EXPR_CONF_STR);
if (filterExprSerialized == null) {
return "";
// throw new IOException("can`t found filter condition in your Sql ,at least you must special a field as ydbpartion ");
}else{
LOG.info(filterExprSerialized);
ExprNodeGenericFuncDesc filterExpr = Utilities.deserializeExpression(filterExprSerialized);
LOG.info(filterExpr);
try {
return Ya100Utils.parserFilter(filterExpr,info);
} catch (Throwable e) {
throw new IOException(e);
}
}