You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ravisankar Mani <rr...@gmail.com> on 2015/10/12 16:05:05 UTC

pagination spark sq

Hi everyone,

Can you please share optimized query for pagination spark sql?


In Ms SQL Server, They have supported "offset" method query for specific
row selection.

Please find the following query

Select BusinessEntityID,[FirstName], [LastName],[JobTitle]
from HumanResources.vEmployee
Order By BusinessEntityID
--OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY

Is this support OFFSET method in spark sql? Kindly share the useful details.

Regards,
Ravi

Re: pagination spark sq

Posted by Richard Hillegas <rh...@us.ibm.com>.
Hi Ravi,

If you build Spark with Hive support, then your sqlContext variable will be
an instance of HiveContext and you will enjoy the full capabilities of the
Hive query language rather than the more limited capabilities of Spark SQL.
However, even Hive QL does not support the OFFSET clause, at least
according to the Hive language manual:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual. Hive does
support the LIMIT clause. The following works for me:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val hc = sqlContext

val schema =
  StructType(
    StructField("x", IntegerType, nullable=false) ::
    StructField("y", DoubleType, nullable=false) :: Nil)

val rdd = sc.parallelize(
  Row(1, 1.0) :: Row(2, 1.34) :: Row(3, 2.3) :: Row(4, 2.5) :: Nil)

val df = hc.createDataFrame(rdd, schema)

df.registerTempTable("test_data")

hc.sql("SELECT * FROM test_data LIMIT 3").show()

exit()


So, to sum up, Hive QL supports a subset of the MySQL LIMIT/OFFSET syntax
(limit, no offset) but does not support the SQL Standard language for
returning a block of rows offset into a large query result.

Hope this helps,
Rick Hillegas



Ravisankar Mani <rr...@gmail.com> wrote on 10/12/2015 07:05:05 AM:

> From: Ravisankar Mani <rr...@gmail.com>
> To: user@spark.apache.org
> Date: 10/12/2015 07:05 AM
> Subject: pagination spark sq
>
> Hi everyone,
>
> Can you please share optimized query for pagination spark sql?
>

> In Ms SQL Server, They have supported "offset" method query for
> specific row selection.

> Please find the following query

> Select BusinessEntityID,[FirstName], [LastName],[JobTitle]
> from HumanResources.vEmployee
> Order By BusinessEntityID
> --OFFSET 10 ROWS
> FETCH NEXT 10 ROWS ONLY

> Is this support OFFSET method in spark sql? Kindly share the useful
details.

> Regards,
> Ravi