You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by Goden Yao <go...@apache.org> on 2016/07/18 18:12:16 UTC

Re: PXF读取HDFS数据NULL问题

Hi 吴彪,

You need to send email to user@hawq.incubator.apache.org.
dev@ only subscribers can see and receive the email, even you sent to dev,
no one will see your email.

I'm adding the user mailing list here and will circle back to dev@ list.
For HAWQ community, please check the following thread for context and I'm
re-attaching the table definition below.

-Goden

=== table definition ==

CREATE TABLE gdm_m04_ord_det_sum(
  sale_ord_det_id varchar ,
  sale_ord_id varchar ,
  parent_sale_ord_id varchar ,
  sale_ord_type_cd varchar ,
  sale_ord_cate_cd varchar ,
  user_id varchar ,
  user_log_acct varchar ,
  user_reg_tm varchar ,
  reg_user_type_cd varchar ,
  user_lv_cd varchar ,
  ord_create_user_lv_cd varchar ,
  campus_user_flag varchar ,
  school_agent_acct_num varchar ,
  user_site_county_id varchar ,
  user_site_city_id varchar ,
  user_site_province_id varchar ,
  user_site_school_id varchar ,
  sale_ord_valid_flag varchar ,
  ord_syn_status_cd varchar ,
  ord_status_cd_1 varchar ,
  ord_status_cd_2 varchar ,
  ord_flag varchar ,
  ord_flag_211 varchar ,
  present_cps_flag varchar ,
  easy_pur_flag varchar ,
  invt_piority_ship_flag varchar ,
  delv_way_cd varchar ,
  pay_mode_cd varchar ,
  sale_qtty bigint ,
  before_prefr_unit_price double precision precision ,
  after_prefr_unit_price double precision precision ,
  before_prefr_amount double precision precision ,
  after_prefr_amount double precision precision ,
  total_offer_amount double precision precision ,
  sku_offer_amount double precision precision ,
  gp_offer_amount double precision precision ,
  suit_offer_amount double precision precision ,
  full_minus_offer_amount double precision precision ,
  telecom_point_offer_amount double precision precision ,
  other_offer_amount double precision precision ,
  dq_and_jq_pay_amount double precision precision ,
  pop_shop_dq_pay_amount double precision precision ,
  pop_shop_jq_pay_amount double precision precision ,
  gift_cps_pay_amount double precision precision ,
  mobile_red_packet_pay_amount double precision precision ,
  acct_bal_pay_amount double precision precision ,
  sku_freight_amount double precision precision ,
  user_actual_pay_amount double precision precision ,
  pay_instalments_fee double precision precision ,
  pay_instalments_ord_flag varchar ,
  pay_instalments_periods bigint ,
  pay_instalments_rate double precision precision ,
  pay_instalments_ord_id varchar ,
  pay_online_bank_name varchar ,
  carry_bill_id varchar ,
  carrier_name varchar ,
  carrier_tel varchar ,
  rev_addr_province_id varchar ,
  rev_addr_city_id varchar ,
  rev_addr_county_id varchar ,
  rev_addr_school_id varchar ,
  sale_ord_tm varchar ,
  sale_ord_dt varchar ,
  sale_ord_det_tm varchar ,
  out_wh_tm varchar ,
  ord_complete_tm varchar ,
  subd_num varchar ,
  delv_center_num varchar ,
  store_id varchar ,
  delv_station_id varchar ,
  campus_sta_flag varchar ,
  major_supp_brevity_code varchar ,
  purchaser_num varchar ,
  purchaser_erp_acct varchar ,
  otc_operr varchar ,
  item_sku_id varchar ,
  item_id varchar ,
  item_name varchar ,
  free_goods_flag varchar ,
  sku_valid_flag varchar ,
  brandname varchar ,
  item_first_cate_cd varchar ,
  item_first_cate_name varchar ,
  item_second_cate_cd varchar ,
  item_second_cate_name varchar ,
  item_third_cate_cd varchar ,
  item_third_cate_name varchar ,
  qgp varchar ,
  item_status_cd varchar ,
  item_type varchar ,
  volume double precision precision ,
  wt double precision precision ,
  item_origin varchar ,
  shelves_tm varchar ,
  utc_tm varchar ,
  sale_qtty_lim bigint ,
  sku_inventory_status_cd varchar ,
  otc_tm varchar ,
  support_cash_on_deliver_flag varchar ,
  gp_item_flag varchar ,
  resv_flag varchar ,
  sku_jd_prc double precision precision ,
  sku_mkt_prc double precision precision ,
  sku_stk_prc double precision precision ,
  sku_wh_qtn double precision precision ,
  sku_reduce_prc_tm varchar ,
  sku_rebate_amount double precision precision ,
  delv_staf_name varchar ,
  ps_carrier_name varchar ,
  ord_item_height double precision precision ,
  ord_item_width double precision precision ,
  ord_item_len double precision precision ,
  item_last_update_tm varchar ,
  pac_propt varchar ,
  item_otc_status_cd varchar ,
  quality_assurance_period varchar ,
  on_or_off_shelves_tm varchar ,
  ware_item_name varchar ,
  vat_invoice_flag varchar ,
  pre_sorting_flag varchar ,
  delv_station_name varchar ,
  place_ord_ip varchar ,
  last_update_tm varchar ,
  present_point bigint ,
  user_point bigint ,
  reg_user_valid_flag varchar ,
  corp varchar ,
  user_reg_addr varchar ,
  reg_birthday varchar ,
  member_reg_gender varchar ,
  user_name varchar ,
  split_status_cd varchar ,
  printer_name varchar ,
  sorter_name varchar ,
  cashier_name varchar ,
  shipper_name varchar ,
  scan_emply_name varchar ,
  self_carry_sta_name varchar ,
  check_account_tm varchar ,
  pay_tm varchar ,
  self_carry_status_cd varchar ,
  print_status_cd varchar ,
  ship_sorting_tm varchar ,
  package_tm varchar ,
  print_tm varchar ,
  cnee_mail_addr varchar ,
  cnee_mobile_num varchar ,
  cnee_fixed_tel varchar ,
  rev_zip varchar ,
  rev_addr varchar ,
  cnee_name varchar ,
  mc_ord_flag varchar ,
  vertical_web_type_cd varchar ,
  campus_ord_flag varchar ,
  night_delv_flag varchar ,
  sorting_tm varchar ,
  free_goods_sku_id varchar ,
  metro_self_carry_flag varchar ,
  ord_proc_tm varchar ,
  ord_cancel_tm varchar ,
  ord_cancel_rsn_cd varchar ,
  server_name varchar ,
  ps_delv_way_cd varchar ,
  cancel_flag varchar ,
  sale_ord_cancel_type_cd varchar ,
  user_payable_pay_amount double precision precision ,
  pop_vender_pty_id varchar ,
  pop_vender_id varchar ,
  pop_shop_id varchar ,
  phy_pop_flag varchar ,
  pop_operr_erp_acct varchar ,
  pop_operr_dept_num varchar ,
  sorting_center_pop_rev_tm varchar ,
  major_item_first_cate_id varchar ,
  discount_rate double precision precision ,
  pop_item_gross_profit double precision precision ,
  chan_client_type_cd varchar ,
  pop_vender_own_afs_flag varchar ,
  flash_gp_flag bigint ,
  flash_gp_offer_amount double precision precision ,
  otc_operr_erp_id varchar ,
  work_post_cd varchar ,
  yixun_point_pay_amount double precision precision ,
  sku_freight_coupon_amount double precision precision ,
  logic_delv_center_num varchar ,
  brand_cd varchar ,
  main_sku_id varchar ,
  pop_shop_lim_sku_jq_pay_amount double precision precision ,
  pop_shop_lim_sku_dq_pay_amount double precision precision ,
  engy_subsidy_amount double precision precision ,
  jbean_pay_amount double precision precision ,
  jq_pay_amount double precision precision ,
  dq_pay_amount double precision precision ,
  union_coupon_pay_amount double precision precision )
distributed by (sale_ord_det_id);

CREATE external table ex_gdm_m04_ord_det_sum_error(like gdm_m04_ord_det_sum)
location('pxf://adhoc/user/hadoop/warehouse/hawq_test.db/gdm_m04_ord_det_sum/dp=HISTORY/dt=2016-07-01'
'?Profile=HdfsTextSimple'
'&COMPRESSION_CODEC=com.hadoop.compression.lzo.LzopCodec'
)
format 'TEXT'
;



On Sun, Jul 17, 2016 at 8:48 PM 吴彪 <sy...@jd.com> wrote:

> The HAWQ version is “HAWQ version 2.0.1.0 build dev”
>
> External table definition of "gdm_m04_ord_det_sum" refer to the attachment
>
>
>
> Thanks
>
>
>
> 吴彪
>
> JD.COM 京东 【大数据部-大数据研发部-平台基础研发部】
>
> ---------------------------------------------------------------
>
> 地址:北京市朝阳区北辰世纪中心A座10层
>
> 邮编:100195
>
> 手机:18810848987
>
> 邮箱:sywubiao@jd.com <sy...@jd.com>
>
>
>
> *发件人:* Goden Yao [mailto:godenyao@apache.org]
> *发送时间:* 2016年7月16日 2:32
> *收件人:* dev <de...@hawq.incubator.apache.org>; sywubiao@jd.com;
> zhoulongbo@jd.com
> *主题:* Re: PXF读取HDFS数据NULL问题
>
>
>
> We do handle NULL for int and bigint.
>
> based on your error message, it doesn't seem to be a NULL issue but more
> data type conversion issue.
>
>
>
> Could you give us:
>
> 1. HAWQ version
>
> 2. external table definition of "gdm_m04_ord_det_sum"
>
> 3. If you can provide us some sample data to reproduce the issue, that'd
> be ideal.
>
>
>
> Thanks
>
> -Goden
>
> On Thu, Jul 14, 2016 at 7:28 PM Lei Chang <le...@apache.org> wrote:
>
> I got the email now, looks there are very big delays here, several hours,
> sometimes.
>
> Thanks
> Lei
>
>
> 2016-07-15 0:50 GMT+08:00 Roman Shaposhnik <ro...@shaposhnik.org>:
>
> > You may want to check your spam filter. I saw all these messages.
> >
> > Thanks,
> > Roman.
> >
> > 2016-07-14 6:03 GMT-07:00 Lei Chang <lc...@pivotal.io>:
> >
> > >
> > > FYI, looks dev mailing list does not get this email. forward to
> dev@hawq
> > > again.
> > >
> > >
> > > ---------- Forwarded message ----------
> > > From: 吴彪 <sy...@jd.com>
> > > Date: 2016-07-14 19:05 GMT+08:00
> > > Subject: PXF读取HDFS数据NULL问题
> > > To: dev@hawq.incubator.apache.org
> > > Cc: 周龙波 <zh...@jd.com>, Brian Lu <bl...@pivotal.io>, Lei Chang <
> > > lchang@pivotal.io>
> > >
> > >
> > > 你好,
> > >
> > >        在使用PXF读取HDFS数据时遇到以下问题,HDFS上的数据是通过Hive生成的,:
> > >
> > > 1.    Hive中String类型的字段有些为NULL,hawq 查询会报以下错误;在Hive中把NULL字段处理为非NULL时,在
> hawq
> > > 中可以正常查询。
> > >
> > >
> > >
> > >
> > >
> > > 2.    Hive中Bigint 类型的字段有些为NULL,hawq 查询会报以下错误:
> > >
> > >
> > >
> > >        请问有比较好的方案可以直接处理带有NULL的数据吗?谢谢
> > >
> > >
> > >
> > > 吴彪
> > >
> > > JD.COM 京东 【大数据部-大数据研发部-平台基础研发部】
> > >
> > > ---------------------------------------------------------------
> > >
> > > 地址:北京市朝阳区北辰世纪中心A座10层
> > >
> > > 邮编:100195
> > >
> > > 手机:18810848987
> > >
> > > 邮箱:*sywubiao@jd.com <sy...@jd.com>*
> > >
> > > ----------------------------------------------------------------
> > >
> > > [image: j39]
> > >
> > >
> > >
> > >
> >[image:
> https://ssl.gstatic.com/bt/C3341AA7A1A076756462EE2E5CD71C11/1x/btw_ic_minimize_white_18dp.png]
>
>