You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by PengHui Li <co...@gmail.com> on 2019/04/04 08:22:38 UTC

How to implement partitioned external table.

Hi guys,

I am integrating hive and pulsar(http://pulsar.apache.org) by
HiveStorageHandler and HiveMetaHook, I want to add a feature can divide the
data into several parts(pulsar topics) when use hive `PARTITIONED BY`. But
don't know how to implement it based on HiveStorageHandler and HiveMetaHook.

- how to get hive table partition definition?
- While user inert data to hive table, how to get the data should be placed
in which partition?
- While use select data data from hive table, how to determine data is in
that partition?

Looking forward to your reply

- Penghui Li

Re: How to implement partitioned external table.

Posted by PengHui Li <co...@gmail.com>.
@Zoltan

Appreciate to your replay, i will open a new topic at developer list.

Best regards
Penghui

Zoltan Haindrich <ki...@rxd.hu> 于2019年4月11日周四 下午4:21写道:

>
>
> On 4/4/19 10:22 AM, PengHui Li wrote:
> > Hi guys,
> >
> > I am integrating hive and pulsar(http://pulsar.apache.org <
> http://pulsar.apache.org/>) by HiveStorageHandler and HiveMetaHook, I
> want to add a feature can divide the data
> > into several parts(pulsar topics) when use hive `PARTITIONED BY`. But
> don't know how to implement it based on HiveStorageHandler and HiveMetaHook.
>
> I think you should be able to access the table's properties from the
> StorageHandler (and get access to the pulsar server address/etc from there).
>
> About supporting topics: I think instead of adding some features to
> support "partitioned by"
> the storage handler could get into predicate push down...by making the
> topic a column.
> To get some ideas how to do that I would first take a look at the jdbc
> storage handler(or hbase).
>
> note: I think this topic might better fit the developer list.
>
> cheers,
> Zoltan
>

Re: How to implement partitioned external table.

Posted by Zoltan Haindrich <ki...@rxd.hu>.

On 4/4/19 10:22 AM, PengHui Li wrote:
> Hi guys,
> 
> I am integrating hive and pulsar(http://pulsar.apache.org <http://pulsar.apache.org/>) by HiveStorageHandler and HiveMetaHook, I want to add a feature can divide the data 
> into several parts(pulsar topics) when use hive `PARTITIONED BY`. But  don't know how to implement it based on HiveStorageHandler and HiveMetaHook.

I think you should be able to access the table's properties from the StorageHandler (and get access to the pulsar server address/etc from there).

About supporting topics: I think instead of adding some features to support "partitioned by"
the storage handler could get into predicate push down...by making the topic a column.
To get some ideas how to do that I would first take a look at the jdbc storage handler(or hbase).

note: I think this topic might better fit the developer list.

cheers,
Zoltan