You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/24 23:31:39 UTC

[GitHub] [incubator-pinot] sajjad-moradi opened a new issue #6966: Issues with recent changes on upload segment endpoint

sajjad-moradi opened a new issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966


   The changes introduced in this [PR](https://github.com/apache/incubator-pinot/pull/6567/files#diff-d0a60155908b7919915de944c49dfad57e2e5e1cc7ad2988176ac090e57567d7R299) caused the following problem for Refresh use cases in LinkedIn:
   - The segment directory in deep store is changed from `tableName/segmentName` to `tableName_OFFLINE/segmentName`. This means that for refresh use cases, the new segments will go to `tableName_OFFLINE` directory, but the older version of the same segment will remain undeleted in `tableName` directory. This is problematic because the older versions need to be manually deleted and now they need extra storage.
   
   Also by just looking at the code, we believe SegmentDeletionManager cannot function properly as it uses rawTableName to construct the uri for the segments to be deleted (details [here](https://github.com/apache/incubator-pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/SegmentDeletionManager.java#L179)).
   
   Please note that with current code, here's the state of the deep store:
   - the realtime segments pushed from RT servers use a separate API and then finally reside in `tableName/segmentName` directory.
   - the realtime segments for upsert use cases will land in `tableName_REALTIME/segmentName` directory.
   - the offline segments will land in `tableName_OFFLINE/segmentName` directory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on issue #6966: Issues with recent changes on upload segment endpoint

Posted by GitBox <gi...@apache.org>.
chenboat commented on issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966#issuecomment-847507284


   I suppose there is already some offline table segments being pushed to _tableName_OFFLINE/segmentName_ dir. Do you need to clean them? Or you have other manual means to clean them? If you want to clean them using Pinot services, I proposed to have the SegmentDeletionManager to delete the extra dir's segments as well. Otherwise, yes I agree we can reverting to the old directory layout. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6966: Issues with recent changes on upload segment endpoint

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966#issuecomment-847457949


   > I believe the problem is on `PinotSegmentUploadDownloadRestletResource` line 295 where `tableNameWithType` is mistakenly passed to the `getZkDownloadURIForSegmentUpload()` where `rawTableName` is expected.
   > 
   > Since this PR has been merged for more than 1 months, should we consider changing the behavior to use `tableNameWithType` in the segment path, or revert it back to the old behavior of using `rawTableName`?
   > 
   > The advantage of using `tableNameWithType` in the segment path is to separate the segments for offline and realtime, and allow same segment name for offline and realtime table (might not be desired though). But this is changing the existing behavior, and might cause other classes to not work properly (such as `SegmentDeletionManager` mentioned above)
   
   Exactly. We should fix the controller to do the right thing -- drop the segment in the right place. Otherwise, we open up to other issues that may only surface much later, making production systems take a huge repair time. Right now, the only thing we could find is retention (the new segments uploaded will not get retained out).  But there is no saying what else is out there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6966: Issues with recent changes on upload segment endpoint

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966#issuecomment-847431113


   I believe the problem is on `PinotSegmentUploadDownloadRestletResource` line 295 where `tableNameWithType` is mistakenly passed to the `getZkDownloadURIForSegmentUpload()` where `rawTableName` is expected.
   
   Since this PR has been merged for more than 1 months, should we consider changing the behavior to use `tableNameWithType` in the segment path, or revert it back to the old behavior of using `rawTableName`?
   
   The advantage of using `tableNameWithType` in the segment path is to separate the segments for offline and realtime, and allow same segment name for offline and realtime table (might not be desired though). But this is changing the existing behavior, and might cause other classes to not work properly (such as `SegmentDeletionManager` mentioned above)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] sajjad-moradi commented on issue #6966: Issues with recent changes on upload segment endpoint

Posted by GitBox <gi...@apache.org>.
sajjad-moradi commented on issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966#issuecomment-848013436


   AFAIK SegmentDeletionManager is used for deleting active segments that need to be deleted because of their retention time. Other than retention issue, there's another problem. For refresh use cases, the new segments go to tableName_OFFLINE/segmentName dir, but the older version remain undeleted under tableName/segmentName dir. This exact issue also exists in case of purge task. The purged segments go to the new directory, while the non-purged version of the same segments remain under old directory. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] sajjad-moradi commented on issue #6966: Issues with recent changes on upload segment endpoint

Posted by GitBox <gi...@apache.org>.
sajjad-moradi commented on issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966#issuecomment-847470180


   @chenboat: I believe it's a safer bet to directly aim for reverting back to the previous directory structure. This way we prevent any potential issues that's not yet discovered. Also if we change SegmentDeletionManager now for cleaning segments in both paths, we later need to make another change once the directory structure is restored to the old way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on issue #6966: Issues with recent changes on upload segment endpoint

Posted by GitBox <gi...@apache.org>.
chenboat commented on issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966#issuecomment-847459216


   The current issue seems to me is primarily about data deletion. We can first patch the SegmentDeletionManager so that it can deal with segments in both types of paths (with type or without). After that we can revert back to the old behavior of using rawTableName.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on issue #6966: Issues with recent changes on upload segment endpoint

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #6966:
URL: https://github.com/apache/incubator-pinot/issues/6966#issuecomment-847470000


   > The current issue seems to me is primarily about data deletion. We can first patch the SegmentDeletionManager so that it can deal with segments in both types of paths (with type or without). After that we can revert back to the old behavior of using rawTableName.
   
   We are seeing production problems with storage, so the best way to move forward is to fix the bleeding and then repair (in the background) what is broken. Please coordinate with @sajjad-moradi  to fix so that upload goes to the right place, and then we can discuss (and add to this issue) how other installations can repair their systems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org