You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Patrick Duin <pa...@gmail.com> on 2016/06/14 10:39:48 UTC
Create external table with partitions using sqlContext.createExternalTable
Hi,
I'm trying to use sqlContext.createExternalTable("my_table",
"/tmp/location/", "orc") to create tables. This is working fine for
non-partitioned tables. I'd like to create a partitioned table though, how
do I do that?
Can I add some information in the options: Map[String, String] parameter?
Thanks,
Patrick
Re: Create external table with partitions using sqlContext.createExternalTable
Posted by Mich Talebzadeh <mi...@gmail.com>.
it is a good to be in control :)
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 14 June 2016 at 13:06, Patrick Duin <pa...@gmail.com> wrote:
> Thanks, yes I have something similar working as "the alternative
> solution". :)
>
> I was hoping to get away with not having to specify my schema so the
> sqlContext.createExternalTable seemed like a nice clean approach.
>
> 2016-06-14 13:59 GMT+02:00 Mich Talebzadeh <mi...@gmail.com>:
>
>> Try this this will work
>>
>> sql("use test")
>> sql("drop table if exists test.orctype")
>> var sqltext: String = ""
>> sqltext = """
>> CREATE EXTERNAL TABLE test.orctype(
>> prod_id bigint,
>> cust_id bigint,
>> time_id timestamp,
>> channel_id bigint,
>> promo_id bigint,
>> quantity_sold decimal(10,0),
>> amount_sold decimal(10,0))
>> PARTITIONED BY (
>> year int,
>> month int)
>> CLUSTERED BY (
>> prod_id,
>> cust_id,
>> time_id,
>> channel_id,
>> promo_id)
>> INTO 256 BUCKETS
>> STORED AS ORC
>> LOCATION '/tmp'
>> TBLPROPERTIES ( "orc.compress"="SNAPPY",
>> "orc.create.index"="true",
>>
>> "orc.bloom.filter.columns"="PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID",
>> "orc.bloom.filter.fpp"="0.05",
>> "orc.stripe.size"="268435456",
>> "orc.row.index.stride"="10000" )
>> """
>> sql(sqltext)
>> sql("select count(1) from test.orctype").show
>>
>> res2: org.apache.spark.sql.DataFrame = [result: string]
>> +---+
>> |_c0|
>> +---+
>> | 0|
>> +---+
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 14 June 2016 at 11:39, Patrick Duin <pa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to use sqlContext.createExternalTable("my_table",
>>> "/tmp/location/", "orc") to create tables. This is working fine for
>>> non-partitioned tables. I'd like to create a partitioned table though, how
>>> do I do that?
>>>
>>> Can I add some information in the options: Map[String, String]
>>> parameter?
>>>
>>> Thanks,
>>> Patrick
>>>
>>
>>
>
Re: Create external table with partitions using sqlContext.createExternalTable
Posted by Patrick Duin <pa...@gmail.com>.
Thanks, yes I have something similar working as "the alternative solution".
:)
I was hoping to get away with not having to specify my schema so the
sqlContext.createExternalTable seemed like a nice clean approach.
2016-06-14 13:59 GMT+02:00 Mich Talebzadeh <mi...@gmail.com>:
> Try this this will work
>
> sql("use test")
> sql("drop table if exists test.orctype")
> var sqltext: String = ""
> sqltext = """
> CREATE EXTERNAL TABLE test.orctype(
> prod_id bigint,
> cust_id bigint,
> time_id timestamp,
> channel_id bigint,
> promo_id bigint,
> quantity_sold decimal(10,0),
> amount_sold decimal(10,0))
> PARTITIONED BY (
> year int,
> month int)
> CLUSTERED BY (
> prod_id,
> cust_id,
> time_id,
> channel_id,
> promo_id)
> INTO 256 BUCKETS
> STORED AS ORC
> LOCATION '/tmp'
> TBLPROPERTIES ( "orc.compress"="SNAPPY",
> "orc.create.index"="true",
>
> "orc.bloom.filter.columns"="PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID",
> "orc.bloom.filter.fpp"="0.05",
> "orc.stripe.size"="268435456",
> "orc.row.index.stride"="10000" )
> """
> sql(sqltext)
> sql("select count(1) from test.orctype").show
>
> res2: org.apache.spark.sql.DataFrame = [result: string]
> +---+
> |_c0|
> +---+
> | 0|
> +---+
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 June 2016 at 11:39, Patrick Duin <pa...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to use sqlContext.createExternalTable("my_table",
>> "/tmp/location/", "orc") to create tables. This is working fine for
>> non-partitioned tables. I'd like to create a partitioned table though, how
>> do I do that?
>>
>> Can I add some information in the options: Map[String, String] parameter?
>>
>> Thanks,
>> Patrick
>>
>
>
Re: Create external table with partitions using sqlContext.createExternalTable
Posted by Mich Talebzadeh <mi...@gmail.com>.
Try this this will work
sql("use test")
sql("drop table if exists test.orctype")
var sqltext: String = ""
sqltext = """
CREATE EXTERNAL TABLE test.orctype(
prod_id bigint,
cust_id bigint,
time_id timestamp,
channel_id bigint,
promo_id bigint,
quantity_sold decimal(10,0),
amount_sold decimal(10,0))
PARTITIONED BY (
year int,
month int)
CLUSTERED BY (
prod_id,
cust_id,
time_id,
channel_id,
promo_id)
INTO 256 BUCKETS
STORED AS ORC
LOCATION '/tmp'
TBLPROPERTIES ( "orc.compress"="SNAPPY",
"orc.create.index"="true",
"orc.bloom.filter.columns"="PROD_ID,CUST_ID,TIME_ID,CHANNEL_ID,PROMO_ID",
"orc.bloom.filter.fpp"="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="10000" )
"""
sql(sqltext)
sql("select count(1) from test.orctype").show
res2: org.apache.spark.sql.DataFrame = [result: string]
+---+
|_c0|
+---+
| 0|
+---+
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 14 June 2016 at 11:39, Patrick Duin <pa...@gmail.com> wrote:
> Hi,
>
> I'm trying to use sqlContext.createExternalTable("my_table",
> "/tmp/location/", "orc") to create tables. This is working fine for
> non-partitioned tables. I'd like to create a partitioned table though, how
> do I do that?
>
> Can I add some information in the options: Map[String, String] parameter?
>
> Thanks,
> Patrick
>