You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/11/17 00:39:03 UTC

[GitHub] [pinot] dongxiaoman opened a new issue #7779: Real time tiered storage feature causing temporary offline segments

dongxiaoman opened a new issue #7779:
URL: https://github.com/apache/pinot/issues/7779


   NOTE: This is not an urgent bug but it seems quite annoying if we can confirm it is the root cause.
   
   Right now we can see clear correlation of query failure (because right now any offline segments could cause failure) seconds after a realtime segment is moved into another tiered storage.
   
   We have a query error for a missing segment at timestamp `"timestamp":"2021-11-16T22:13:18.256Z`
   and 5 seconds earlier we see logs indicating the segment was dropped from real time server. And a few minutes later we see the same segment showing up in its tiered servers.
   
   The log for dropping in streaming server is:
   ```
   2021/11/16 22:13:15.345 INFO [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Instance Server_st-fw-81.service.consul_8098, partition point_entry__34__576__20211116T0930Z received state transition from OFFLINE to DROPPED on session 30043ed1a3604f2, message id: d9310a75-1742-4758-981e-32c0b193f7eb
   ```
   
   In my mental model, it could be this reason:
   1. Segment is set to be moved to another tier due to TTL
   2. The segment is dropped from Real time server, but the new tier has not completed the "ONLINE" task needed for that segment yet
   3. The segment appears offline from Pinot controller, Query kicks in, Brokers (? or servers?) complains about missing segments from Real time server
   
   The step #3 is still a bit strange, did broker not receive the segment external view change event within 5 seconds? The segment is going to show up in another tiered storage
   
   If we think of a tiered storage move of segment as "rebalance", we actually should have the option to do the "no-downtime" move of segments into another tier. Keep one replica in place, ensure the new replica shows up, and then move another?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang edited a comment on issue #7779: Real time tiered storage feature causing temporary offline segments

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang edited a comment on issue #7779:
URL: https://github.com/apache/pinot/issues/7779#issuecomment-972288791


   What is the replication setting for this table? When moving segments across tiers, there should be no downtime if there are more than one replicas.
   Also, can you please share the query error you got? If it is `BROKER_SEGMENT_UNAVAILABLE_ERROR_CODE (305)`, you should be able to find warning of `Failed to find servers hosting segment: ...` on the broker.
   These information can help debug the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] dongxiaoman closed issue #7779: Real time tiered storage feature causing temporary offline segments

Posted by GitBox <gi...@apache.org>.
dongxiaoman closed issue #7779:
URL: https://github.com/apache/pinot/issues/7779


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7779: Real time tiered storage feature causing temporary offline segments

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7779:
URL: https://github.com/apache/pinot/issues/7779#issuecomment-972288791


   What is the replication setting for this table? When moving segments across tiers, there should be no downtime if there are more than one replicas.
   Also, can you please share the query error you got? If it is `BROKER_SEGMENT_UNAVAILABLE_ERROR_CODE (305)`, you should be able to find warning of `Failed to find servers hosting segment: ...` on the broker


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] dongxiaoman commented on issue #7779: Real time tiered storage feature causing temporary offline segments

Posted by GitBox <gi...@apache.org>.
dongxiaoman commented on issue #7779:
URL: https://github.com/apache/pinot/issues/7779#issuecomment-999962600


   The replication factor is 2.
   At the time of this error being reported, we had enforced the rule in our Pinot Java Client to throw exception whenever there is one segment not being processed. The rule is strict so we saw many errors.
   For now let's close this issue; we can re-open if there are similar complaints


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org