You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shailesh Birari <sb...@gmail.com> on 2015/03/20 02:30:45 UTC
Spark SQL Self join with agreegate
Hello,
I want to use Spark sql to aggregate some columns of the data.
e.g. I have huge data with some columns as:
time, src, dst, val1, val2
I want to calculate sum(val1) and sum(val2) for all unique pairs of src and
dst.
I tried by forming SQL query
SELECT a.time, a.src, a.dst, sum(a.val1), sum(a.val2) from table a, table
b where a.src = b.src and a.dst = b.dst
I know I am doing something wrong here.
Can you please let me know is it doable and how ?
Thanks,
Shailesh
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Self-join-with-agreegate-tp22151.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark SQL Self join with agreegate
Posted by SachinJanani <sa...@gmail.com>.
I am not sure whether this can be possible but i have tried something like
SELECT time, src, dst, sum(val1), sum(val2) from table group by
src,dst;
and it works.I think it will result the same answer as you are expecting
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Self-join-with-agreegate-tp22151p22378.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
RE: Spark SQL Self join with agreegate
Posted by "Cheng, Hao" <ha...@intel.com>.
Not so sure your intention, but something like "SELECT sum(val1), sum(val2) FROM table GROUP BY src, dest" ?
-----Original Message-----
From: Shailesh Birari [mailto:sbirari77@gmail.com]
Sent: Friday, March 20, 2015 9:31 AM
To: user@spark.apache.org
Subject: Spark SQL Self join with agreegate
Hello,
I want to use Spark sql to aggregate some columns of the data.
e.g. I have huge data with some columns as:
time, src, dst, val1, val2
I want to calculate sum(val1) and sum(val2) for all unique pairs of src and dst.
I tried by forming SQL query
SELECT a.time, a.src, a.dst, sum(a.val1), sum(a.val2) from table a, table b where a.src = b.src and a.dst = b.dst
I know I am doing something wrong here.
Can you please let me know is it doable and how ?
Thanks,
Shailesh
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Self-join-with-agreegate-tp22151.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org