You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/02/02 20:04:14 UTC

[GitHub] [incubator-pinot] daniellavoie opened a new issue #6524: Provide table ingestion status through API

daniellavoie opened a new issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524


   # Summary
   
   Operating Pinot on behalf of data engineers represents a challenge when these users only have access to the Controller endpoints. SRE aren't specifically aware of tables configurations details as created by the system users. If something is wrong, there is no other means for data engineers to get the details of the error than requesting log extraction from SREs.
   
   # Observed use cases
   
   ## Realtime table ingestion
   
   Any Kafka connectivity issue coming from a bad server, port or credential will not be reported back in the table status.
   
   ## BatchConfig
   
   Recently on `0-7-0.SNAPSHOT`, a new  awesome `batchIngestionConfig` allows minion tasks to ingest data. For misc reason, that ingestion may fail because of user provided configuration (invalid S3, or GCS credentials, etc). Sadly, the only way for a data engineer to get feedback from the root cause not seing anything being ingested is ask an SRE to investigate the Pinot logs.
   
   # Suggestion
   
   Include a table ingestion state as part of the table status.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-785295453


   @daniellavoie the freshness metric is returned on a per query basis. Users will be issuing query, and on the return, can get metadata about:
   - Freshness
   - Partial response
   
   If either of these are not in line with what user expects (e.g. freshness should not be less than current time minus 20 minutes), they can contact an SRE to look into the problem. 
   
   Does this model work for you?
   
   Even if we open up some basic table information to users, the users will need to come tp the SREs to make something of the message anyway.  What you are looking for is some smart log processing service that analyzes logs, splits them on a per table basis, consolidates logs for the same table across different partitions and servers and reaches some "smart" conclusion about what may be going wrong, and outputs that to the user in one sentence or paragraph, with an action to be taken.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-785308520


   I feel there are two parts to this
   1. Need for an API to know the ingestionStatus of a table
   2. How do we implement that API and what should we use for that
   
   I think having 1 is important and its user friendly. It allows us to internally call multiple endpoints/queries etc to present a complete overview of what's going on with the table. I am happy to have an overall status API for a table and ingestion Status is a part of the response.
   
   Freshness/partial response/actually testing the connection information is part of the implementation and we can start with what makes sense and enhance as needed but having a standard endpoint is a good start.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-772095947


   We have a metric that is set to 1 if a stream partition is consuming correctly, 0 otherwise. In LinkedIn, we set an alert if the consumption falls to 0 for more than some period of time.
   
   Note that if the consumption fails for some stream partitions, the Realtime segment checker in the controller automatically attempts to restart consumption on the partitions. You can set the frequency of the automatic check, and set the alerts to fire after that time interval (if it is still not fixed).
   
   https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java#L31:3


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-772159802


   Counters, metrics and gages does not tell the story of what is wrong. Health check observability is not the problem I am trying to solve here. Client needs to understand the error message related to the bad configuration, not if their config is good or bad.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-824199231


   Also see Issue 4035


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] icefury71 commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
icefury71 commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-823768688


   Added a high level design document for this issue. At the moment, I've only considered Pinot RealTime tables. Once the high level approach is agreed upon - I will extend this to Minion based ingestion of Offline tables as well. 
   
   You can find the design here: https://docs.google.com/document/d/12w6rEJBRKACKomSdL871GCjTtzLxY1N7kYE308RT8JY/edit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie edited a comment on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie edited a comment on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-785307895


   I'm trying to build a model where my users can self troubleshoot something like a bad Kafka url or credentials. These configs are user provided and we should have the ability to provide to them feedback about these being wrong. Same goes for the newly available minion based ingestion. Once a task is triggered, the only way to find out that your s3 bucket has permission issues is to call the SRE who has no context about what you are trying to do and look in every controller logs.
   
   I agree that not all error while be debuggable by users and will involve an SRE. Still, I would like to at least reduce the amount of cases which SRE is mandatory to even begin the investigation of a fat finger config issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-772102874


   Thanks for these details @mcvsubbu. My use case is more about providing feedback to the user who created the table config from the rest API. Metrics are intended to monitored by SRE, but if I don't have access to logs, there is no way for me to understand what is wrong with the table config without calling the SRE running the cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-785307895


   I'm trying to build a model where my users can self troubleshoot something like a bad Kafka url or credentials. These configs are user provided and we should have the ability to provide to them feedback about these being wrong. Same goes for the newly available minion based ingestion. Once a task is triggered, the only way to find out that your s3 bucket has permission issues is to call the SRE who has no context about what you are trying to do and have in look in every controller logs.
   
   I agree that not all error while be debuggable by users and will involve an SRE. Still, I would to at least reduce the amount of cases which SRE is mandatory to even begin the investigation of a fat finger config issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] johighley commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
johighley commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-797682547


   I was told to add this use case to this issue:  We're ingesting millions of records via Kafka to a realtime table as fast as Pinot can consume them.  It's 1 record per message.  We need some way to monitor the message ingestion for failures, especially due to data issues (ex: alpha chars in numeric field, missing data, or json format issue).  Looking through server logs isn't practical.  Ideally, Pinot would provide the raw message and the error it encountered.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie edited a comment on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie edited a comment on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-785307895


   I'm trying to build a model where my users can self troubleshoot something like a bad Kafka url or credentials. These configs are user provided and we should have the ability to provide to them feedback about these being wrong. Same goes for the newly available minion based ingestion. Once a task is triggered, the only way to find out that your s3 bucket has permission issues is to call the SRE who has no context about what you are trying to do and look in every controller logs.
   
   I agree that not all error while be debuggable by users and will involve an SRE. Still, I would to at least reduce the amount of cases which SRE is mandatory to even begin the investigation of a fat finger config issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-774816316


   > hmm, in this case, shall we actually provide an API `/logs` in each pinot component to fetch the log file?
   
   This will not solve the problem I'm trying to frame. First of all, a `/logs` endpoint will expose everything and still only means something to an operator. My persona is not an SRE but an actual data administrator. To some extend, it could even be a webapp that manages table configs on top of Pinot. If each table maintains state of their ingestion status and any error report (e.g.: the kafka bootstrap server is unavailable), this will greatly improve the ability of pinot users (again not SRE) to self troubleshoot configuration errors or even table troubleshoot temporary errors (Kafka not available). Different tables may face different ingestion status, and error root cause. One table may have an S3 credential error while the second is failing to communicate with Kafka. This status could actually be a nested `status` field within `/table/{tableName}` output. Or even maybe a `/table/{tableName}/ingestion-status` endpoint. 
   
   My last comment regarding a log file approach: exposing a `/logs` endpoint is not very possible since logs are independently configured with log4j, the app is not really aware where the logs are actually spitted. Keep in mind that best practices instruct to output logs to stdout and not a specific log file. Anyways, the log approach should not be considered.
   
   cc @kishoreg 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-772158231


   We also return metadata to the broker as to the newest timestamp we have consumed. It is set in the BrokerNativeResponse as the minConsumingFreshnessTimeMs, indicating the lowest timestamp amongst the most recent record consumed in each segment. (i.e. if segments g1 and g2 are queried, and the timestamp of most recent record in these are t1 and t2 respectively, then we will return min(t1, t2)).
   
   The client can monitor the freshness and realize if the min is too far off from current time.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-772102874


   Thanks for these details @mcvsubbu. My use case is more about providing feedback to the user who created the table config from the rest API. Metrics are intended to monitored by SRE, but if I don't have access to logs, there is no way for me to understand what is wrong with the table config without calling the SRE running the cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-824077963


   I like this design very very much! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-774642690


   hmm, in this case, shall we actually provide an API `/logs` in each pinot component to fetch the log file?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] daniellavoie edited a comment on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
daniellavoie edited a comment on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-774816316


   > hmm, in this case, shall we actually provide an API `/logs` in each pinot component to fetch the log file?
   
   This will not solve the problem I'm trying to frame. First of all, a `/logs` endpoint will expose everything and still only means something to an operator. My persona is not an SRE but an actual data administrator. To some extend, it could even be a webapp that manages table configs on top of Pinot. If each table maintains state of their ingestion status and any error report (e.g.: the kafka bootstrap server is unavailable), this will greatly improve the ability of pinot users (again not SRE) to self troubleshoot configuration errors or connectivity error (Kafka not available). Different tables may face different ingestion status, and error root cause. One table may have an S3 credential error while the second is failing to communicate with Kafka. This status could actually be a nested `status` field within `/table/{tableName}` output. Or even maybe a `/table/{tableName}/ingestion-status` endpoint. 
   
   My last comment regarding a log file approach: exposing a `/logs` endpoint is not very possible since logs are independently configured with log4j, the app is not really aware where the logs are actually spitted. Keep in mind that best practices instruct to output logs to stdout and not a specific log file. Anyways, the log approach should not be considered.
   
   cc @kishoreg 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6524: Provide table ingestion status through API

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6524:
URL: https://github.com/apache/incubator-pinot/issues/6524#issuecomment-772095947


   We have a metric that is set to 1 if a stream partition is consuming correctly, 0 otherwise. In LinkedIn, we set an alert if the consumption falls to 0 for more than some period of time.
   
   Note that if the consumption fails for some stream partitions, the Realtime segment checker in the controller automatically attempts to restart consumption on the partitions. You can set the frequency of the automatic check, and set the alerts to fire after that time interval (if it is still not fixed).
   
   https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/metrics/ServerGauge.java#L31:3


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org