You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/15 18:39:10 UTC

[GitHub] [pinot] walterddr opened a new issue #7910: Improve segment-level DebugInfo

walterddr opened a new issue #7910:
URL: https://github.com/apache/pinot/issues/7910


   This issue was discussed partially in https://github.com/apache/pinot/pull/7896.
   
   Originally the issue we discovered was that there were null stacktrace DebugInfo returned from the DebugAPI, (see below). We found out that 2 of the place stacktrace were swallowed.
   
   While we address this issue by making the error explicitly thrown. we realized that the control flow were not as expected. There are several concerns:
   @sajjad-moradi : If we throw exception in `buildSegmentAndRepalce` without correctly capture and change state, then the segment goes to error state in external view upon receiving helix transition message consuming -> online. see https://github.com/apache/pinot/pull/7896#discussion_r768340557
   @mcvsubbu : status.ERROR should not be used when a segment creation was discarded because this another host might've been able to build it correctly. see: https://github.com/apache/pinot/pull/7896#discussion_r769141107; and when detecting a recoverable error we should not include that information in the debug info page in the first place. see: https://github.com/apache/pinot/pull/7896#discussion_r769160523
   
   there's another question I notice from myself and @Jackie-Jiang is that the segment errors were captured and put in the debug API in almost all methods (public/protected/private), can we clean up the logic by creating a rule so that:
   only external facing methods should capture and log debug info; and all internal methods should only bubble up the exception for external facing methods to capture and log with more contextual information?
   
   Please discuss. 
   
   
   
   <details>
     <summary>Click to expand code block!</summary>
   
   ```
   [
     {
       "tableName": "testTable_REALTIME",
       "numSegments": 1,
       "numServers": 1,
       "numBrokers": 1,
       "segmentDebugInfos": [
         {
           "segmentName": "testTable__0__0__20211210T0838Z",
           "serverState": {
             "Server_localhost_8098": {
               "idealState": "CONSUMING",
               "externalView": "CONSUMING",
               "segmentSize": "0 bytes",
               "consumerInfo": {
                 "segmentName": "testTable__0__0__20211210T0838Z",
                 "consumerState": "NOT_CONSUMING",
                 "lastConsumedTimestamp": 1639125751331,
                 "partitionToOffsetMap": {
                   "0": "325618872"
                 }
               },
               "errorInfo": {
                 "timestamp": "2021-12-10 08:42:35 UTC",
                 "errorMessage": "Could not build segment",
                 "stackTrace": null
               }
             }
           }
         }
       ],
       "serverDebugInfos": [],
       "brokerDebugInfos": [
         {
           "brokerName": "Broker_localhost_8099",
           "idealState": "ONLINE",
           "externalView": "ONLINE"
         }
       ],
       "tableSize": {
         "reportedSize": "0 bytes",
         "estimatedSize": "0 bytes"
       },
       "ingestionStatus": {
         "ingestionState": "UNHEALTHY",
         "errorMessage": "Segment: testTable__0__0__20211210T0838Z is not being consumed on server: Server_localhost_8098"
       }
     }
   ]
   ```
   </details>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-996384351


   I remember we had some issues early on when for whatever reason segment build took a long time, it went into infinite loop. replica A timed out, replica B took over, and then replica B timed out, came back to replica A, and so on. In order to NOT go in infinite loop, we save the built segment, and if we are asked to build the same segment, we commit the segment already built. I don't want this bug to be un-fixed. Just bringing it up as something to keep in mind when trying to improve debug API. 
   
   I realize there was a null stack trace.  The question is, did something malfunction? Did you gather a trace from the server to see what it was?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-996853985


   yes, we later checked the server log and found out that the ingested data was not conforming with the time-format provided to the ingestion transformation function. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr edited a comment on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
walterddr edited a comment on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-996853985


   yes, we later checked the server log and found out that the ingested data was not conforming with the time-format provided to the ingestion transformation function. However this is rather inconvenient to dig into the server logs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-996902090


   Would you be able to copy/paste the server exception log? Just want to make sure that the particular execution path is covered in our solution. If ingested data was the problem, I suppose the stack would have been where we ingest data and transform it? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-998893883


   sorry, i don't have the server exception log readily available, but we've seen this errorInfo from debug endpoint multiple times with `"errorInfo": {"errorMessage": "Could not build segment", "stackTrace": null}`. 
   
   this is what I am trying to address:
   by searching through the code path there's only 2 places where this could happen, and both of them are utilizing `buildSegmentInternal` method, but captured the exception early thus failed to record the exception stacktrace, without these information it is very difficult to debug what's going on.
   #7909 addressed this by moving the error collection inside `buildSegmentInternal`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-995364416


   @mcvsubbu could you please comment on this issue? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-996382412


   Can we log/add the exception where it is caught and continue the existing interface?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-996857528


   > Can we log/add the exception where it is caught and continue the existing interface?
   
   https://github.com/apache/pinot/pull/7909/ was aiming at exactly this - catch and log the exception where it is and continue the existing interface without changing the behavior


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #7910: Improve segment-level DebugInfo

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #7910:
URL: https://github.com/apache/pinot/issues/7910#issuecomment-995396789


   @mcvsubbu Essentially we want to change the way how exception is passed. Currently exception is caught within the method, and passed via boolean `false` or `null`. The problem of this approach is that we lost the stack trace as that won't be included in the return value. The proposal here is to throw the exception out and the caller can catch and handle the exception with the full stack trace.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org