You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Patrick Duin <pa...@gmail.com> on 2016/06/14 10:39:48 UTC

Create external table with partitions using sqlContext.createExternalTable

Hi,

I'm trying to use sqlContext.createExternalTable("my_table",
"/tmp/location/", "orc") to create tables. This is working fine for
non-partitioned tables. I'd like to create a partitioned table though, how
do I do that?

Can I add some information in the options: Map[String, String] parameter?

Thanks,
 Patrick

Re: Create external table with partitions using sqlContext.createExternalTable

Posted by Mich Talebzadeh <mi...@gmail.com>.
it is a good to be in control :)

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 14 June 2016 at 13:06, Patrick Duin <pa...@gmail.com> wrote:

> Thanks, yes I have something similar working as "the alternative
> solution". :)
>
> I was hoping to get away with not having to specify my schema so the
> sqlContext.createExternalTable seemed like a nice clean approach.
>
> 2016-06-14 13:59 GMT+02:00 Mich Talebzadeh <mi...@gmail.com>:
>
>> Try this this will work
>>
>> sql("use test")
>> sql("drop table if exists test.orctype")
>> var sqltext: String = ""
>> sqltext = """
>> CREATE EXTERNAL TABLE test.orctype(
>>          prod_id bigint,
>>          cust_id bigint,
>>          time_id timestamp,
>>          channel_id bigint,
>>          promo_id bigint,
>>          quantity_sold decimal(10,0),
>>          amount_sold decimal(10,0))
>> PARTITIONED BY (
>>                        year int,
>>                        month int)
>> CLUSTERED BY (
>>          prod_id,
>>          cust_id,
>>          time_id,
>>          channel_id,
>>          promo_id)
>> INTO 256 BUCKETS
>> STORED AS ORC
>> LOCATION '/tmp'
>> TBLPROPERTIES ( "orc.compress"="SNAPPY",
>>       "orc.create.index"="true",
>>
>> "orc.bloom.filter.columns"="PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID",
>>       "orc.bloom.filter.fpp"="0.05",
>>       "orc.stripe.size"="268435456",
>>       "orc.row.index.stride"="10000" )
>> """
>> sql(sqltext)
>> sql("select count(1) from test.orctype").show
>>
>> res2: org.apache.spark.sql.DataFrame = [result: string]
>> +---+
>> |_c0|
>> +---+
>> |  0|
>> +---+
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 14 June 2016 at 11:39, Patrick Duin <pa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to use sqlContext.createExternalTable("my_table",
>>> "/tmp/location/", "orc") to create tables. This is working fine for
>>> non-partitioned tables. I'd like to create a partitioned table though, how
>>> do I do that?
>>>
>>> Can I add some information in the options: Map[String, String]
>>> parameter?
>>>
>>> Thanks,
>>>  Patrick
>>>
>>
>>
>

Re: Create external table with partitions using sqlContext.createExternalTable

Posted by Patrick Duin <pa...@gmail.com>.
Thanks, yes I have something similar working as "the alternative solution".
:)

I was hoping to get away with not having to specify my schema so the
sqlContext.createExternalTable seemed like a nice clean approach.

2016-06-14 13:59 GMT+02:00 Mich Talebzadeh <mi...@gmail.com>:

> Try this this will work
>
> sql("use test")
> sql("drop table if exists test.orctype")
> var sqltext: String = ""
> sqltext = """
> CREATE EXTERNAL TABLE test.orctype(
>          prod_id bigint,
>          cust_id bigint,
>          time_id timestamp,
>          channel_id bigint,
>          promo_id bigint,
>          quantity_sold decimal(10,0),
>          amount_sold decimal(10,0))
> PARTITIONED BY (
>                        year int,
>                        month int)
> CLUSTERED BY (
>          prod_id,
>          cust_id,
>          time_id,
>          channel_id,
>          promo_id)
> INTO 256 BUCKETS
> STORED AS ORC
> LOCATION '/tmp'
> TBLPROPERTIES ( "orc.compress"="SNAPPY",
>       "orc.create.index"="true",
>
> "orc.bloom.filter.columns"="PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID",
>       "orc.bloom.filter.fpp"="0.05",
>       "orc.stripe.size"="268435456",
>       "orc.row.index.stride"="10000" )
> """
> sql(sqltext)
> sql("select count(1) from test.orctype").show
>
> res2: org.apache.spark.sql.DataFrame = [result: string]
> +---+
> |_c0|
> +---+
> |  0|
> +---+
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 June 2016 at 11:39, Patrick Duin <pa...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to use sqlContext.createExternalTable("my_table",
>> "/tmp/location/", "orc") to create tables. This is working fine for
>> non-partitioned tables. I'd like to create a partitioned table though, how
>> do I do that?
>>
>> Can I add some information in the options: Map[String, String] parameter?
>>
>> Thanks,
>>  Patrick
>>
>
>

Re: Create external table with partitions using sqlContext.createExternalTable

Posted by Mich Talebzadeh <mi...@gmail.com>.
Try this this will work

sql("use test")
sql("drop table if exists test.orctype")
var sqltext: String = ""
sqltext = """
CREATE EXTERNAL TABLE test.orctype(
         prod_id bigint,
         cust_id bigint,
         time_id timestamp,
         channel_id bigint,
         promo_id bigint,
         quantity_sold decimal(10,0),
         amount_sold decimal(10,0))
PARTITIONED BY (
                       year int,
                       month int)
CLUSTERED BY (
         prod_id,
         cust_id,
         time_id,
         channel_id,
         promo_id)
INTO 256 BUCKETS
STORED AS ORC
LOCATION '/tmp'
TBLPROPERTIES ( "orc.compress"="SNAPPY",
      "orc.create.index"="true",

"orc.bloom.filter.columns"="PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID",
      "orc.bloom.filter.fpp"="0.05",
      "orc.stripe.size"="268435456",
      "orc.row.index.stride"="10000" )
"""
sql(sqltext)
sql("select count(1) from test.orctype").show

res2: org.apache.spark.sql.DataFrame = [result: string]
+---+
|_c0|
+---+
|  0|
+---+

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 14 June 2016 at 11:39, Patrick Duin <pa...@gmail.com> wrote:

> Hi,
>
> I'm trying to use sqlContext.createExternalTable("my_table",
> "/tmp/location/", "orc") to create tables. This is working fine for
> non-partitioned tables. I'd like to create a partitioned table though, how
> do I do that?
>
> Can I add some information in the options: Map[String, String] parameter?
>
> Thanks,
>  Patrick
>