You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/04/25 08:52:35 UTC

[GitHub] [pinot] kmozaid opened a new pull request, #8587: support refreshing realtime segments

kmozaid opened a new pull request, #8587:
URL: https://github.com/apache/pinot/pull/8587

   Instructions:
   1. The PR has to be tagged with at least one of the following labels (*):
      1. `feature`
      2. `bugfix`
      3. `performance`
      4. `ui`
      5. `backward-incompat`
      6. `release-notes` (**)
   2. Remove these instructions before publishing the PR.
    
   (*) Other labels to consider:
   - `testing`
   - `dependencies`
   - `docker`
   - `kubernetes`
   - `observability`
   - `security`
   - `code-style`
   - `extension-point`
   - `refactor`
   - `cleanup`
   
   (**) Use `release-notes` label for scenarios like:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi commented on pull request #8587: support refreshing realtime segments

Posted by GitBox <gi...@apache.org>.
sajjad-moradi commented on PR #8587:
URL: https://github.com/apache/pinot/pull/8587#issuecomment-1108803581

   @kmozaid This is being done as part of opening up upload endpoint for realtime segments - in this PR: https://github.com/apache/pinot/pull/8584
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kmozaid closed pull request #8587: support refreshing realtime segments

Posted by GitBox <gi...@apache.org>.
kmozaid closed pull request #8587: support refreshing realtime segments
URL: https://github.com/apache/pinot/pull/8587


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kmozaid commented on pull request #8587: support refreshing realtime segments

Posted by GitBox <gi...@apache.org>.
kmozaid commented on PR #8587:
URL: https://github.com/apache/pinot/pull/8587#issuecomment-1118222392

   > 
   
   Sure, I will close this PR.
   
   On segment refresh, I think you should do following -
   
   1. Upload a segment (say, with segmenName: test_segment_1) to a realtime table.
   2. Upload a segment with the same name that you used in step 1 and but with some different data in it (similar to purge task which removes some records and upload segment with same name).
   3. Assert that records are updated (may be count has changed if some records were removed in step 2) when you query the table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] sajjad-moradi commented on pull request #8587: support refreshing realtime segments

Posted by GitBox <gi...@apache.org>.
sajjad-moradi commented on PR #8587:
URL: https://github.com/apache/pinot/pull/8587#issuecomment-1111382267

   > > @kmozaid This is being done as part of opening up upload endpoint for realtime segments - in this PR: #8584
   > 
   > ohk, then I can close this PR, right? Also I don't see the below change in `RealtimeTableDataManager` , How Server is going to handle `REFRESH_SEGMENT` message sent by controller when segment is uploaded? Did you verify that segment uploaded are queryable in your code?
   > 
   
   Yes you can close this PR.
   Also good catch on segment refresh message handling on server side. In the integration test that I modified to upload realtime segments, I considered the refresh use case and modified the CRC of the existing segment on ZK for it, but in servers, the CRC remains the same on the local file and the segment refresh path doesn't get to the point to actually download the segment - it uses the existing segment since the CRC of the uploaded segment and the one residing on server are the same. I'll make the changes and try to update the integration test to change the CRC of the local file on server side.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kmozaid commented on pull request #8587: support refreshing realtime segments

Posted by GitBox <gi...@apache.org>.
kmozaid commented on PR #8587:
URL: https://github.com/apache/pinot/pull/8587#issuecomment-1109254853

   > > @kmozaid This is being done as part of opening up upload endpoint for realtime segments - in this PR: #8584
   > 
   > ohk, then I can close this PR, right? Also I don't see the below change in `RealtimeTableDataManager` , How Server is going to handle `REFRESH_SEGMENT` message sent by controller when segment is uploaded? Did you verify that segment uploaded are queryable in your code?
   > 
   > ```
   >   @Override
   >   public void addSegment(File indexDir, IndexLoadingConfig indexLoadingConfig)
   >       throws Exception {
   >     Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
   >     addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
   >   }
   > ```
   > 
   > JFYI, message which controller send to server -
   > 
   > ```
   > ZnRecord=87e551ff-cc98-4fdf-b861-6be3b8d561bd, {CREATE_TIMESTAMP=1650873841979, EXECUTE_START_TIMESTAMP=1650873854007, MSG_ID=87e551ff-cc98-4fdf-b861-6be3b8d561bd, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=transcript__0__2__20220425T0634Z, RESOURCE_NAME=transcript_REALTIME, RETRY_COUNT=0, SRC_CLUSTER=PinotCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_localhost_9001, TGT_NAME=Server_localhost_8001, TGT_SESSION_ID=1000232d2430120, TIMEOUT=-1, segmentName=transcript__0__2__20220425T0634Z, tableName=transcript_REALTIME}{}{}, Stat=Stat {_version=0, _creationTime=1650873842026, _modifiedTime=1650873842026, _ephemeralOwner=0}
   > ```
   
   Refresh Message is being sent here -
   https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/api/upload/ZKOperator.java#L204
   
   Message is being handled here -
   https://github.com/apache/pinot/blob/master/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/SegmentMessageHandlerFactory.java#L96
   
   This is where Unsupported exception is thrown if you don't add implementation of addSegment() method.
   https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java#L373


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kmozaid commented on pull request #8587: support refreshing realtime segments

Posted by GitBox <gi...@apache.org>.
kmozaid commented on PR #8587:
URL: https://github.com/apache/pinot/pull/8587#issuecomment-1109249139

   > @kmozaid This is being done as part of opening up upload endpoint for realtime segments - in this PR: #8584
   
   ohk, then I can close this PR, right? Also I don't see the below change in `RealtimeTableDataManager` , How Server is going to handle `REFRESH_SEGMENT` message sent by controller when segment is uploaded? Did you verify that segment uploaded are queryable in your code?
   
   ```
     @Override
     public void addSegment(File indexDir, IndexLoadingConfig indexLoadingConfig)
         throws Exception {
       Schema schema = ZKMetadataProvider.getTableSchema(_propertyStore, _tableNameWithType);
       addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig, schema));
     }
   ```
   
   JFYI, message which controller send to server -
   ```
   ZnRecord=87e551ff-cc98-4fdf-b861-6be3b8d561bd, {CREATE_TIMESTAMP=1650873841979, EXECUTE_START_TIMESTAMP=1650873854007, MSG_ID=87e551ff-cc98-4fdf-b861-6be3b8d561bd, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=transcript__0__2__20220425T0634Z, RESOURCE_NAME=transcript_REALTIME, RETRY_COUNT=0, SRC_CLUSTER=PinotCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_localhost_9001, TGT_NAME=Server_localhost_8001, TGT_SESSION_ID=1000232d2430120, TIMEOUT=-1, segmentName=transcript__0__2__20220425T0634Z, tableName=transcript_REALTIME}{}{}, Stat=Stat {_version=0, _creationTime=1650873842026, _modifiedTime=1650873842026, _ephemeralOwner=0}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org