You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Vinoth Chandar <vi...@apache.org> on 2019/05/01 11:11:21 UTC

Re: multi-partitioned hudi table | partitions not created

I recommend using the HiveSync tool to manage the registration and not do
it manually.
Otherwise, what you see are expected behavior.. part1, part2 will be on the
file, if it was on the data frame

On Mon, Apr 29, 2019 at 11:11 PM SATISH SIDNAKOPPA <
satish.sidnakoppa.it@gmail.com> wrote:

> files in hdfs
>
>
> /apps/hive/warehouse/emp_multi_partkey/part1=A/part2=2018
>
> manaul create table
> CREATE EXTERNAL TABLE `emp_multi_partkey`(
>   `_hoodie_commit_time` string,
>   `_hoodie_commit_seqno` string,
>   `_hoodie_record_key` string,
>   `_hoodie_partition_path` string,
>   `_hoodie_file_name` string,
> emp_id string
> part_col` string)
> PARTITIONED BY (
>   `part1` string,
>   `part2` string)
>
> in dataset these 2 columns exists too
>
> concat('part1=',part1,'/part2=',part2) as part_col
> where part1=A and part2=2018
>
> I am able to update and delete records.Will there be any gap if this
> process in followed?
>
> On Tue, Apr 30, 2019 at 11:36 AM SATISH SIDNAKOPPA <
> satish.sidnakoppa.it@gmail.com> wrote:
>
> > Hi Vinoth,
> >
> > I created the multi_part as below.
> >
> > in dataset ---> concat('part1=',SUBSTR(emp_name,1,1),'/part2=','2018') as
> > part_col
> > in spark.write hud set ------>
> > .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"part_col")
> >
> > files in hdfs
> >
> >
> > alter table hudi.emp_multi_partkey add partition(part1='A',part2='2018')
> ;
> >
> >
> >
> >
> > On Mon, Apr 29, 2019 at 8:30 PM Vinoth Chandar <vi...@apache.org>
> wrote:
> >
> >> Hi Satish,
> >>
> >> Thats because the default KeyGenerator class only reads in a single
> field
> >> to partition on. What you are expecting is a composite key.
> >>
> >> Nishith has one in the test suite PR
> >>
> >>
> https://github.com/apache/incubator-hudi/pull/623/files#diff-8814d5eb596f19bc9a87e419453fd7c8
> >>
> >> We plan to add this to the main code. For now, you can copy the class
> and
> >> see if solves your need? KeyGenerator is pluggable anyway
> >>
> >> Thanks
> >> Vinoth
> >>
> >> On Mon, Apr 29, 2019 at 7:20 AM SATISH SIDNAKOPPA <
> >> satish.sidnakoppa.it@gmail.com> wrote:
> >>
> >> > Hi Team,
> >> >
> >> >
> >> > I have to store data by department and region.
> >> > /dept=HR/region=AP
> >> > /dept=OPS/region=AP
> >> > /dept=HR/region=SA
> >> > /dept=OPS/region=SA
> >> >
> >> > so partitioned table created will have multi-keys
> >> >
> >> >
> >> > I tried passing value as comma separated(dept,region)
> >> >
> >>
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"dept,region")
> >> >
> >> > and dot separated,
> >> >
> >>
> .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY,"dept.region")
> >> >
> >> > but the partitions were not created in hdfs.All the data added to
> >> default
> >> > partition.
> >> >
> >> >
> >> > Could you guide in format of passing the multi-partitions to spark
> write
> >> > hudi dataset.
> >> >
> >> > regards
> >> > Satish S
> >> >
> >>
> >
>