You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by masaki rikitoku <ri...@gmail.com> on 2015/02/26 10:31:44 UTC

number of partitions for hive schemaRDD

Hi all

now, I'm trying the SparkSQL with hivecontext.

when I execute the hql like the following.

---

val ctx = new org.apache.spark.sql.hive.HiveContext(sc)
import ctx._

val queries = ctx.hql("select keyword from queries where dt =
'2015-02-01' limit 10000000")

---

It seem that the number of the partitions ot the queries is set by 1.

Is this the specifications for schemaRDD, SparkSQL, HiveContext ?

Are there any means to set the number of partitions arbitrary value
except for explicit repartition


Masaki Rikitoku

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: number of partitions for hive schemaRDD

Posted by Cheng Lian <li...@gmail.com>.
Hi Masaki,

I guess what you saw is the partition number of the last stage, which 
must be 1 to perform the global phase of LIMIT. To tune partition number 
of normal shuffles like joins, you may resort to 
spark.sql.shuffle.partitions.

Cheng

On 2/26/15 5:31 PM, masaki rikitoku wrote:
> Hi all
>
> now, I'm trying the SparkSQL with hivecontext.
>
> when I execute the hql like the following.
>
> ---
>
> val ctx = new org.apache.spark.sql.hive.HiveContext(sc)
> import ctx._
>
> val queries = ctx.hql("select keyword from queries where dt =
> '2015-02-01' limit 10000000")
>
> ---
>
> It seem that the number of the partitions ot the queries is set by 1.
>
> Is this the specifications for schemaRDD, SparkSQL, HiveContext ?
>
> Are there any means to set the number of partitions arbitrary value
> except for explicit repartition
>
>
> Masaki Rikitoku
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org