You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/20 19:01:30 UTC

[GitHub] [hudi] PhantomHunt commented on issue #6936: [SUPPORT] NPE when trying to upsert with option hoodie.metadata.index.column.stats.enable : true.

PhantomHunt commented on issue #6936:
URL: https://github.com/apache/hudi/issues/6936#issuecomment-1286006358

   Hi,
   
   Apologies for delayed response.
   
   Since the table contains my organization's sensitive information, I can
   only share the schema of the table.
   
   Postgresql create table script:
   
   CREATE TABLE IF NOT EXISTS myschema."VODProgramme"
   (
       "VODSRID" bigint,
       "ProgramVODSRID" bigint,
       "SeasonVODSRID" bigint,
       "ProgrammeId" character varying(400) COLLATE pg_catalog."default",
       "ProgrammeName" character varying(400) COLLATE pg_catalog."default",
       "ProgrammeLanguage" character varying(400) COLLATE pg_catalog."default",
       "ProgrammeGenre" character varying(400) COLLATE pg_catalog."default",
       "Synopsis" character varying(4000) COLLATE pg_catalog."default",
       "OperatorId" integer,
       "ProgramImage" character varying(1000) COLLATE pg_catalog."default",
       "ProgrammeUrl" character varying(1000) COLLATE pg_catalog."default",
       "Director" character varying(600) COLLATE pg_catalog."default",
       "ContentType" character varying(100) COLLATE pg_catalog."default",
       "ContentTypeID" smallint,
       "Cast" text COLLATE pg_catalog."default",
       "ProgramReleaseDate" date,
       "ChannelName" character varying(100) COLLATE pg_catalog."default",
       "SourceSeasonid" character varying(500) COLLATE pg_catalog."default",
       "SeasonName" character varying(400) COLLATE pg_catalog."default",
       "SeasonImage" character varying(1000) COLLATE pg_catalog."default",
       "EpisodeCount" integer,
       "SeasonCount" integer,
       "SourceEpisodeid" character varying(500) COLLATE pg_catalog."default",
       "EpisodeNumber" double precision,
       "EpisodeTitle" character varying(400) COLLATE pg_catalog."default",
       "EpisodeDescription" character varying(4000) COLLATE
   pg_catalog."default",
       "EpisodeReleasedate" date,
       "EpisodeImage" character varying(1000) COLLATE pg_catalog."default",
       "EpisodeVideoUrl" character varying(1000) COLLATE pg_catalog."default",
       "SocialType" character varying(50) COLLATE pg_catalog."default",
       "SocialCount" character varying(50) COLLATE pg_catalog."default",
       "ParentalRating" character varying(50) COLLATE pg_catalog."default",
       "SeasonNumber" integer,
       "Duration" character varying(50) COLLATE pg_catalog."default",
       "EpisodeGenre" character varying(400) COLLATE pg_catalog."default",
       "ProductionYear" character varying(50) COLLATE pg_catalog."default",
       "CommercialModel" character varying(50) COLLATE pg_catalog."default",
       "Commercials" character varying(50) COLLATE pg_catalog."default",
       "StartDate" date,
       "EndDate" date,
       "LastUpdatedOn" timestamp without time zone,
       "CreatedDate" timestamp without time zone,
       "CMSProgrammeID" bigint,
       "CatalogStatusID" smallint,
       "CMSLastUpdatedOn" timestamp without time zone,
       "VCMSLastUpdatedOn" timestamp without time zone,
       "ProgramAndroidDeeplink" character varying(1000) COLLATE
   pg_catalog."default",
       "SeasonAndroidDeeplink" character varying(1000) COLLATE
   pg_catalog."default",
       "EpisodeAndroidDeeplink" character varying(1000) COLLATE
   pg_catalog."default",
       "UDISNoofEpisodes" integer,
       "ApprovedStatus" smallint,
       "EpisodeiosDeeplink" character varying(1000) COLLATE
   pg_catalog."default",
       "SeasoniosDeeplink" character varying(1000) COLLATE
   pg_catalog."default",
       "ProgramiosDeeplink" character varying(1000) COLLATE
   pg_catalog."default"
   )
   
   We have created the table via Athena. PFB the DDL of the same:
   
   CREATE EXTERNAL TABLE `vodprogramme_nonpartition`(
     `vodsrid` bigint,
     `programvodsrid` bigint,
     `seasonvodsrid` bigint,
     `programmeid` string,
     `programmename` string,
     `programmelanguage` string,
     `programmegenre` string,
     `synopsis` string,
     `operatorid` int,
     `programimage` string,
     `programmeurl` string,
     `director` string,
     `contenttype` string,
     `contenttypeid` int,
     `cast` string,
     `programreleasedate` date,
     `channelname` string,
     `sourceseasonid` string,
     `seasonname` string,
     `seasonimage` string,
     `episodecount` int,
     `seasoncount` int,
     `sourceepisodeid` string,
     `episodenumber` double,
     `episodetitle` string,
     `episodedescription` string,
     `episodereleasedate` date,
     `episodeimage` string,
     `episodevideourl` string,
     `socialtype` string,
     `socialcount` string,
     `parentalrating` string,
     `seasonnumber` int,
     `duration` string,
     `episodegenre` string,
     `productionyear` string,
     `commercialmodel` string,
     `commercials` string,
     `startdate` date,
     `enddate` date,
     `lastupdatedon` timestamp,
     `createddate` timestamp,
     `cmsprogrammeid` bigint,
     `catalogstatusid` int,
     `cmslastupdatedon` timestamp,
     `vcmslastupdatedon` timestamp,
     `programandroiddeeplink` string,
     `seasonandroiddeeplink` string,
     `episodeandroiddeeplink` string,
     `udisnoofepisodes` int,
     `approvedstatus` int,
     `episodeiosdeeplink` string,
     `seasoniosdeeplink` string,
     `programiosdeeplink` string)
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hudi.hadoop.HoodieParquetInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'S3_PATH'
   
   
   Note: There are around 2 million records in this table. For fresh
   insertion, we are not getting any errors but when we are upserting the
   same table again, we are getting this error. This happens for
   partitioned tables -> PARTITIONED BY (`operatorid` int)) as well for
   same dataset.
   
   *Warm regards*
   
   *B M Vinjit *
   
   Software Developer
   
   B.Sc(Hons.) CS (DU) | MCA (NIT Kurukshetra )
   
   Phone: +91 8527467123
   
   *LinkedIn <https://www.linkedin.com/in/vinjit/> *
   
   
   On Fri, Oct 14, 2022 at 12:59 PM Y Ethan Guo ***@***.***>
   wrote:
   
   > @PhantomHunt <https://github.com/PhantomHunt> Thanks for reporting the
   > issue. If possible, could you share the schema and sample data that can
   > reproduce this issue? The issue is likely related to a specific data type.
   > You may strip out any sensitive columns and data that are irrelevant to the
   > issue.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/hudi/issues/6936#issuecomment-1278600500>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AITJTTPDMT4CHR5DR6OSGXLWDEDT7ANCNFSM6AAAAAARD5FFWQ>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org