You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Jonathan Davidson <jm...@gmail.com> on 2017/06/06 20:06:14 UTC

%sql counts from multiple columns by checkbox

Hello all, first time poster! I tried looking in forums and documentation
but nothing seems to suit the specific nature of my question.

I am trying to display counts based on checkbox functionality in
Zeppelin.The base table can be pretty simple:
%spark
case class Data(a: Int, b: String)
val df = Seq(
  Data(1, "one"),
  Data(2, "two"),
  Data(3,null)).toDF()
df.collect()
df.registerTempTable("table")

I want counts in SQL:
%sql
SELECT COUNT(a),COUNT(b) FROM table
--gives 3 and 2, respectively

I would think to use
SELECT COUNT(${checkbox:test=a,a|b}) FROM table
However, the problem is that it doesn't scale to multiple checked values at
the same time. If you use more than one, it comes out as the equivalent of
SELECT COUNT(a,b) FROM ds2
which returns one value of 2; I want two values 3 and 2 as stated above

Basically, I've tried turning the values into COUNT(a) and COUNT(b) but it
gets confused:
SELECT ${checkbox(whatever):test=COUNT(a)|COUNT(b),COUNT(a)|COUNT(b)} FROM
ds2
cannot recognize input near 'FROM' 'ds2' '<EOF>' in select clause; line 1
pos 8

I've also tried simple manipulation like adding stuff in parenthesis after
${checkbox...
SELECT ${checkbox(whatever):test=a|b,a|b} FROM ds2
cannot resolve 'awhateverb' given input columns: [a, b]; line 1 pos 7

Does anyone have advice?

Thanks,
Jonathan

Re: %sql counts from multiple columns by checkbox

Posted by Jonathan Davidson <jm...@gmail.com>.
For anyone facing the same issue, I've sort of solved my issue by trying to
convert the entire statement into a string. Seems like it should be
unnecessary, but feel free to offer a better method!

[image: Inline image 1]

Code:
%spark
case class Data(a: Int, b: String)
val ds2 = Seq(
  Data(1, "one"),
  Data(2, "two"),
  Data(3,null)).toDF()

ds2.collect()
ds2.registerTempTable("ds2")

%spark
val options = Seq(("a","a"),("b","b"))
val selectstatement = "SELECT
COUNT("+z.checkbox("columns",options).mkString("), COUNT(")+") FROM ds2"
println(selectstatement)

%spark
ds2.sqlContext.sql(selectstatement).show

Refinements will be possible now that it works.


On Tue, Jun 6, 2017 at 4:06 PM, Jonathan Davidson <jm...@gmail.com>
wrote:

> Hello all, first time poster! I tried looking in forums and documentation
> but nothing seems to suit the specific nature of my question.
>
> I am trying to display counts based on checkbox functionality in
> Zeppelin.The base table can be pretty simple:
> %spark
> case class Data(a: Int, b: String)
> val df = Seq(
>   Data(1, "one"),
>   Data(2, "two"),
>   Data(3,null)).toDF()
> df.collect()
> df.registerTempTable("table")
>
> I want counts in SQL:
> %sql
> SELECT COUNT(a),COUNT(b) FROM table
> --gives 3 and 2, respectively
>
> I would think to use
> SELECT COUNT(${checkbox:test=a,a|b}) FROM table
> However, the problem is that it doesn't scale to multiple checked values
> at the same time. If you use more than one, it comes out as the equivalent
> of
> SELECT COUNT(a,b) FROM ds2
> which returns one value of 2; I want two values 3 and 2 as stated above
>
> Basically, I've tried turning the values into COUNT(a) and COUNT(b) but it
> gets confused:
> SELECT ${checkbox(whatever):test=COUNT(a)|COUNT(b),COUNT(a)|COUNT(b)}
> FROM ds2
> cannot recognize input near 'FROM' 'ds2' '<EOF>' in select clause; line 1
> pos 8
>
> I've also tried simple manipulation like adding stuff in parenthesis after
> ${checkbox...
> SELECT ${checkbox(whatever):test=a|b,a|b} FROM ds2
> cannot resolve 'awhateverb' given input columns: [a, b]; line 1 pos 7
>
> Does anyone have advice?
>
> Thanks,
> Jonathan
>