You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spot.apache.org by Jeremy Nelson <je...@digitalminion.com> on 2020/03/30 19:47:42 UTC

Hive Partition Names for numbers (SPOT-19 and SPOT-239)

Greetings,

I noticed that SPOT-19 and SPOT-239 describe two aspects of the same
problem – that is – HIVE partition names are stored as numbers (ie,
y=2020/m=3/d=4) but some software (like the SPOT-ML) expects these
partition names to be stored as two-digit zero-padded strings (ie,
y=2020/m=03/d=04).

We should identify which approach is preferable (break the schema, or break
the software?) and then harmonize all affected systems to do the same
thing, one way or the other.

Thanks,

Jeremy

Re: Hive Partition Names for numbers (SPOT-19 and SPOT-239)

Posted by Vivienne Pustell <vi...@yellingviv.com>.
Hi Jeremy,

My personal vote would be to align around the two-digit zero-padded
strings. It's more standardized, and the consistency in format makes it
easier to run scripts as needed. In that sense, it should be viable to run
a script to change the numbers that are currently stored, and then stay
consistent going forward. We've been running into issues with things like
this in $DAY_JOB and it's definitely worth getting consistent sooner rather
than later, IMO.

Cheers,
-Vivienne

On Mon, Mar 30, 2020 at 12:47 PM Jeremy Nelson <je...@digitalminion.com>
wrote:

> Greetings,
>
> I noticed that SPOT-19 and SPOT-239 describe two aspects of the same
> problem – that is – HIVE partition names are stored as numbers (ie,
> y=2020/m=3/d=4) but some software (like the SPOT-ML) expects these
> partition names to be stored as two-digit zero-padded strings (ie,
> y=2020/m=03/d=04).
>
> We should identify which approach is preferable (break the schema, or break
> the software?) and then harmonize all affected systems to do the same
> thing, one way or the other.
>
> Thanks,
>
> Jeremy
>