You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/04/21 22:19:13 UTC

[GitHub] [pinot] deemoliu opened a new issue, #8579: Data loss in offline table after tenant migration and rebalancing (no server restart)

deemoliu opened a new issue, #8579:
URL: https://github.com/apache/pinot/issues/8579

   We lose some data after migrating table to another tenant and rebalancing with downtime (no server restart). Is server restart a required step after migrating table to another tenant?
   
   We did the following operation to move a table to another tenant 
   1. update tenant server in table config.
   2. rebalance table with downtime.
   
   After rebalanced, the below queries doesn't return result as expected.
   `select count(0), datestr from
   rta.rta.panorama_city_minute_metrics
   where datestr>=‘2022-03-28’
   group by 2 order by 2`
   
   the result is as follows
   `
   count(*) | datestr
   1761336 | 2022-03-28
   1438965 | 2022-03-30
   1751511 | 2022-03-31
   1789216 | 2022-04-01
   1796171 | 2022-04-02
   1772584 | 2022-04-03
   1756764 | 2022-04-04
   1756555 | 2022-04-05
   `
   
   We have rerun the hive to pinot pipelines multiple times (without restarting server), the data on 03-29 is still not queryable. 
   Queries will return different number of records. sometime 03-30 data is also missing. In external view each servers has three servers.
   
   i looked into the servers holding segments that match the date 2022-03-29, and it shows the following
   `
   qiaochu@pinotHost:~$ sudo cat /var/upinot/stream-pinot-server/dataDir/rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000/v3/metadata.properties |grep date
   segment.dimension.column.names = city,completed_avg_surge,completed_avg_surge_sub_0,completed_avg_surge_sub_1,completed_trips,datestr,demand_jobs_accepted_rate_sub_0,demand_jobs_count_driver_canceled,demand_jobs_requested,driver_acceptance_rate,driver_accepts,driver_cancellation_rate,driver_offered_dispatches,eyeball_eta_sum,geodriver_accepted,geodriver_arrived,geodriver_dispatched,geodriver_driving_client,geodriver_open,gross_bookings_usd,helix_sessions,helix_sessions_surged,mean_demand_jobs_ata,mean_demand_jobs_ata_sub_0,mean_demand_jobs_ata_sub_1,mean_eta_post_request,mean_eta_post_request_sub_0,mean_eta_post_request_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_0,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_2,minute,non_requesting_non_zeroed_ss,nonzero,pct_surged_demand_jobs,requesting_sessions,shopping_sessions_v1,summary_completed_sessions,summary_
 driver_cancel,summary_non_stressed_nrs,summary_requesting_sessions,summary_rider_cancel,summary_stressed_nrs,summary_unfulfilled,supply_minutes,surged_demand_jobs_completed_avg_surge,surged_demand_jobs_completed_avg_surge_sub_0,surged_demand_jobs_completed_avg_surge_sub_1,surged_demand_jobs_count,surged_session_avg_surge_bb8,surged_session_avg_surge_bb8_sub_0,surged_session_avg_surge_bb8_sub_1,surged_unique_session_count_bb8_hp,time_bucket,unique_session_count_bb8_hp,zeroed_sessions,zeroed_shopping_sessions_in_sessions_stats,zeros
   segment.datetime.column.names =
   column.datestr.cardinality = 1
   column.datestr.totalDocs = 1438965
   column.datestr.dataType = STRING
   column.datestr.bitsPerElement = 1
   column.datestr.lengthOfEachEntry = 10
   column.datestr.columnType = DIMENSION
   column.datestr.isSorted = true
   column.datestr.hasNullValue = false
   column.datestr.hasDictionary = true
   column.datestr.textIndexType = NONE
   column.datestr.hasInvertedIndex = true
   column.datestr.isSingleValues = true
   column.datestr.maxNumberOfMultiValues = 0
   column.datestr.totalNumberOfEntries = 1438965
   column.datestr.isAutoGenerated = false
   column.datestr.defaultNullValue = null
   column.datestr.minValue = 2022-03-30
   column.datestr.maxValue = 2022-03-30
   `
   the segment rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000 should show datestr.minValue 03-29 but it shows as 03-30.
   
   We rerun the hive to Pinot jobs. then we restart the host, and we get the following
   `
   qiaochu@streampinot-prod12-dca1:~$ sudo ls -l /var/upinot/stream-pinot-server/dataDir/rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000/v3/
   total 143184
   -rwx------ 1 upinot upinot 146537680 Apr  4 20:34 columns.psf
   -rwx------ 1 upinot upinot        16 Apr  4 20:34 creation.meta
   -rwx------ 1 upinot upinot     13571 Apr  4 20:34 index_map
   -rwx------ 1 upinot upinot     60446 Apr  4 20:34 metadata.properties
   qiaochu@streampinot-prod12-dca1:~$ sudo cat /var/upinot/stream-pinot-server/dataDir/rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000/v3/metadata.properties |grep date
   segment.dimension.column.names = city,completed_avg_surge,completed_avg_surge_sub_0,completed_avg_surge_sub_1,completed_trips,datestr,demand_jobs_accepted_rate_sub_0,demand_jobs_count_driver_canceled,demand_jobs_requested,driver_acceptance_rate,driver_accepts,driver_cancellation_rate,driver_offered_dispatches,eyeball_eta_sum,geodriver_accepted,geodriver_arrived,geodriver_dispatched,geodriver_driving_client,geodriver_open,gross_bookings_usd,helix_sessions,helix_sessions_surged,mean_demand_jobs_ata,mean_demand_jobs_ata_sub_0,mean_demand_jobs_ata_sub_1,mean_eta_post_request,mean_eta_post_request_sub_0,mean_eta_post_request_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_0,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_2,minute,non_requesting_non_zeroed_ss,nonzero,pct_surged_demand_jobs,requesting_sessions,shopping_sessions_v1,summary_completed_sessions,summary_
 driver_cancel,summary_non_stressed_nrs,summary_requesting_sessions,summary_rider_cancel,summary_stressed_nrs,summary_unfulfilled,supply_minutes,surged_demand_jobs_completed_avg_surge,surged_demand_jobs_completed_avg_surge_sub_0,surged_demand_jobs_completed_avg_surge_sub_1,surged_demand_jobs_count,surged_session_avg_surge_bb8,surged_session_avg_surge_bb8_sub_0,surged_session_avg_surge_bb8_sub_1,surged_unique_session_count_bb8_hp,time_bucket,unique_session_count_bb8_hp,zeroed_sessions,zeroed_shopping_sessions_in_sessions_stats,zeros
   segment.datetime.column.names =
   column.datestr.cardinality = 1
   column.datestr.totalDocs = 1744810
   column.datestr.dataType = STRING
   column.datestr.bitsPerElement = 1
   column.datestr.lengthOfEachEntry = 10
   column.datestr.columnType = DIMENSION
   column.datestr.isSorted = true
   column.datestr.hasNullValue = false
   column.datestr.hasDictionary = true
   column.datestr.textIndexType = NONE
   column.datestr.hasInvertedIndex = true
   column.datestr.isSingleValues = true
   column.datestr.maxNumberOfMultiValues = 0
   column.datestr.totalNumberOfEntries = 1744810
   column.datestr.isAutoGenerated = false
   column.datestr.defaultNullValue = null
   column.datestr.minValue = 2022-03-29
   column.datestr.maxValue = 2022-03-29
   `
   the query result correct result after restarting the host.
   
   
   We have two questions
   1. what's the root cause that 03-29 data are missing in the table.
   2. Why segment with the 1648598399_1648598399 timestamp in the name, holding the 03-30 day's data before we restart server?
   
   
    
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8579: Data loss in offline table after tenant migration and rebalancing (no server restart)

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8579:
URL: https://github.com/apache/pinot/issues/8579#issuecomment-1105827444

   Did you recently refreshed this segment (push a new segment with same name but different content)? The only explanation to this behavior is that the segment refresh message is somehow not picked up by the server, so the server still serves the old segment. When the server is restarted, it detects the CRC mismatch when trying to load the segment, then downloads the new segment.
   Can you please check the server log for ERROR messages? One possibility is that server is disconnected from the ZK; another possibility is that all the message handling thread is blocked, where you should find some prove by checking the message handling time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] npawar commented on issue #8579: Data loss in offline table after tenant migration and rebalancing (no server restart)

Posted by GitBox <gi...@apache.org>.
npawar commented on issue #8579:
URL: https://github.com/apache/pinot/issues/8579#issuecomment-1116703184

   Hey @deemoliu , what were your further findings? Were the urls updated by some operation on your end, or did it happen because of some Pinot operation that needs to be debugged further?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] deemoliu commented on issue #8579: Data loss in offline table after tenant migration and rebalancing (no server restart)

Posted by GitBox <gi...@apache.org>.
deemoliu commented on issue #8579:
URL: https://github.com/apache/pinot/issues/8579#issuecomment-1105818586

   cc: @yupeng9 @chenboat @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] deemoliu commented on issue #8579: Data loss in offline table after tenant migration and rebalancing (no server restart)

Posted by GitBox <gi...@apache.org>.
deemoliu commented on issue #8579:
URL: https://github.com/apache/pinot/issues/8579#issuecomment-1116546213

   Thanks @Jackie-Jiang for taking a look.
   
   We have successfully reproduce the issue yesterday by rebalancing the table. We find that something updated the hdfs url for this segment.
   
   `/var/log/streaming-pinot-controller/pinot-controller-2022-04-06.2.txt:2022-04-06 22:13:48.490 [jersey-server-managed-async-executor-9] INFO  o.a.p.c.a.r.PinotSegmentUploadDownloadRestletResource  - Setting zkDownloadUri: to viewfs://ns-default/user/hive/warehouse/marketplace_lts.db/pinot/panorama_city_minute_vvid_metrics/production/2022-04-06/pinot_archived/rta_panorama_city_minute_vvid_metrics_00000.tar.gz for segment: rta_panorama_city_minute_vvid_metrics_1649289599_1649289599_00000 of table: rta_panorama_city_minute_vvid_metrics_OFFLINE, skipping move`
   
   `/var/log/streaming-pinot-controller/pinot-controller-2022-04-07.0.txt:2022-04-07 03:33:39.453 [jersey-server-managed-async-executor-2] INFO  o.a.p.c.a.r.PinotSegmentUploadDownloadRestletResource  - Setting zkDownloadUri: to viewfs://ns-default/user/hive/warehouse/marketplace_lts.db/pinot/panorama_city_minute_vvid_metrics/production/2022-04-07/pinot_archived/rta_panorama_city_minute_vvid_metrics_00000.tar.gz for segment: rta_panorama_city_minute_vvid_metrics_1649289599_1649289599_00000 of table: rta_panorama_city_minute_vvid_metrics_OFFLINE, skipping move`
   
   i will look into the logs to see why the hdfs url was updated.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] deemoliu commented on issue #8579: Data loss in offline table after tenant migration and rebalancing (no server restart)

Posted by GitBox <gi...@apache.org>.
deemoliu commented on issue #8579:
URL: https://github.com/apache/pinot/issues/8579#issuecomment-1116774756

   hi @npawar thanks for looking into the issue. I think the URL is updated by some user operation. The user has three hive to Pinot jobs that used the same input hive table and write to the same pinot table. I will understand more on the use case and get back to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org