You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Dave Birdsall <da...@esgyn.com> on 2016/01/14 22:31:11 UTC

Metadata tables use UTF8 but histograms tables use UCS2

Hi,



I’ve noticed that Trafodion metadata tables (e.g. trafodion.”_MD_”.objects)
commonly use UTF8 as the character set for their columns, however the
histograms table (trafodion.<pick your schema>.sb_histograms) uses UCS2.



I’m wondering if this was intended? Or perhaps it was an oversight from
when the predecessor product was ported to HBase? (In the latter case, I’m
wondering if it makes sense to open a JIRA to have them converted?)



Dave

Re: Metadata tables use UTF8 but histograms tables use UCS2

Posted by Qifan Chen <qi...@esgyn.com>.
It also requires a upgrade path for the existing histograms tables at
customer site(s).

Given the # of tasks on hands, I would think this probably is not most
urgent.

Thanks --Qifan

On Thu, Jan 14, 2016 at 3:41 PM, Hans Zeller <ha...@esgyn.com> wrote:

> Hi Dave, that is for historical reasons. Initially, we supported only
> ISO8859-1 column and table names, but we had a UCS2 data type. Therefore,
> the column values in histograms had to be UCS2 as well. We had a project to
> change all the places where we deal with ANSI names to UCS2, but that would
> have been a big project.
>
> Then, along came UTF-8, making it so much easier to deal with Unicode in
> C++ programs. For Trafodion, we decided that the metadata columns for names
> would change from CHAR(n) CHARACTER SET ISO88591 to CHAR(n BYTES) CHARACTER
> SET UTF8. That required relatively little change in the code, both can be
> represented by char * or NAString and both have the same length. The
> histograms table was already in UCS2 and it was not changed. If we would
> have had UTF-8 from the start we would probably have chosen that instead
> for histograms.
>
> Hans
>
> On Thu, Jan 14, 2016 at 1:31 PM, Dave Birdsall <da...@esgyn.com>
> wrote:
>
> > Hi,
> >
> >
> >
> > I’ve noticed that Trafodion metadata tables (e.g.
> trafodion.”_MD_”.objects)
> > commonly use UTF8 as the character set for their columns, however the
> > histograms table (trafodion.<pick your schema>.sb_histograms) uses UCS2.
> >
> >
> >
> > I’m wondering if this was intended? Or perhaps it was an oversight from
> > when the predecessor product was ported to HBase? (In the latter case,
> I’m
> > wondering if it makes sense to open a JIRA to have them converted?)
> >
> >
> >
> > Dave
> >
>



-- 
Regards, --Qifan

Re: Metadata tables use UTF8 but histograms tables use UCS2

Posted by Hans Zeller <ha...@esgyn.com>.
Hi Dave, that is for historical reasons. Initially, we supported only
ISO8859-1 column and table names, but we had a UCS2 data type. Therefore,
the column values in histograms had to be UCS2 as well. We had a project to
change all the places where we deal with ANSI names to UCS2, but that would
have been a big project.

Then, along came UTF-8, making it so much easier to deal with Unicode in
C++ programs. For Trafodion, we decided that the metadata columns for names
would change from CHAR(n) CHARACTER SET ISO88591 to CHAR(n BYTES) CHARACTER
SET UTF8. That required relatively little change in the code, both can be
represented by char * or NAString and both have the same length. The
histograms table was already in UCS2 and it was not changed. If we would
have had UTF-8 from the start we would probably have chosen that instead
for histograms.

Hans

On Thu, Jan 14, 2016 at 1:31 PM, Dave Birdsall <da...@esgyn.com>
wrote:

> Hi,
>
>
>
> I’ve noticed that Trafodion metadata tables (e.g. trafodion.”_MD_”.objects)
> commonly use UTF8 as the character set for their columns, however the
> histograms table (trafodion.<pick your schema>.sb_histograms) uses UCS2.
>
>
>
> I’m wondering if this was intended? Or perhaps it was an oversight from
> when the predecessor product was ported to HBase? (In the latter case, I’m
> wondering if it makes sense to open a JIRA to have them converted?)
>
>
>
> Dave
>