You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "shuitai (via GitHub)" <gi...@apache.org> on 2023/06/12 09:59:38 UTC

[GitHub] [pinot] shuitai opened a new issue, #10896: Is pinot possible to support batch Ingestion with Upsert?

shuitai opened a new issue, #10896:
URL: https://github.com/apache/pinot/issues/10896

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Is pinot possible to support batch Ingestion with Upsert? [pinot]

Posted by "rohityadav1993 (via GitHub)" <gi...@apache.org>.
rohityadav1993 commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1962725406

   Tried to upload segments to an upsert table and faced with the following challenges based on this previous [discussion doc](https://docs.google.com/document/d/1STYxZsUYGcYrzHdmOBymQ1NkpxuEzLa4OTiw4mGHiKk/edit#heading=h.2nui75aa5u6b):
   
   1. Generate a segment using `simple` name generator strategy to create a segment using IngestionUtils and upload:
       There are checks in place which expect the segment to be named as a LLC segment:
   ```
       j.l.IllegalArgumentException: Invalid LLC segment name: testtable_1695407400000_1695414594000_0_1
           at c.g.common.base.Preconditions.checkArgument(Preconditions.java:210)
           at o.a.p.c.utils.LLCSegmentName.<init>(LLCSegmentName.java:42)
           at o.a.p.c.u.PeerServerSegmentFinder.getPeerServerURIs(PeerServerSegmentFinder.java:64)
           at o.a.p.c.d.m.r.RealtimeTableDataManager.downloadSegmentFromPeer(RealtimeTableDataManager.java:604)
           ... 15 common frames omitted
       Wrapped by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Invalid LLC segment name: testtable_1695407400000_1695414594000_0_1
           at o.a.p.c.d.m.r.RealtimeTableDataManager.downloadSegmentFromPeer(RealtimeTableDataManager.java:612)
           at o.a.p.c.d.m.r.RealtimeTableDataManager.downloadAndReplaceSegment(RealtimeTableDataManager.java:539)
           at o.a.p.c.d.m.r.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:420)
           at o.a.p.s.s.h.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDa...
   ```
   2. Use `fixed` name generator strategy and name segment as LLC name:
   ```
   Got invalid segment metadata when adding segment: rta_rider_sessions_test_REALTIME__0__0__19700101T0000Z for table: testtable_REALTIME, reason: New uploaded LLC segment must have start/end offset in the segment metadata
   ```
   
   We may have to solve this by considering below options:
   
   1. Allow offline segments like names in real-time table
   2. Allow LLC segment like names for uploaded segment without enforcement of offset metadata. (the client needs to manage segment name conflicts)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Is pinot possible to support batch Ingestion with Upsert? [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1964733773

   In order to update offline segments to upsert table, they need to be partitioned.
   @chenboat Could you please share the steps of bootstrapping an upsert table with offline segments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10896: Is pinot possible to support batch Ingestion with Upsert?

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1595431499

   This should be straight forward because real-time table supports all operations in offline table. We may move the upsert handling logic into the `BaseTableDataManager` so that it can be shared for both table types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] tibrewalpratik17 commented on issue #10896: Is pinot possible to support batch Ingestion with Upsert?

Posted by "tibrewalpratik17 (via GitHub)" <gi...@apache.org>.
tibrewalpratik17 commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1587038746

   @atris and I are working on the design doc for this and will create an issue for this soon. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Is pinot possible to support batch Ingestion with Upsert? [pinot]

Posted by "rohityadav1993 (via GitHub)" <gi...@apache.org>.
rohityadav1993 commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1968398090

   @Jackie-Jiang 
   `segmentPartitionConfig` makes sense. The problem is at a different place.
   
   If we upload an offline segment (Controller identifies this from offline segment naming convention `testtable_1695407400000_1695414594000_0_1`) it is able to figure out the partition from name and assign to right server. The server fails to load the segment due the LLCSegment name validation.
   
   If we create a segment name following the LLCSegment naming convention, the controller tries to add it to upsert table but fails it during metadata validation essentially failing the upload. 
   
   I believe we will have to refine the validations to make backfills to upsert tables work.
   It can also be useful to discuss if we want to allow offline segment names for realtime upsert table or introduce a new naming convention which can have its own validation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Is pinot possible to support batch Ingestion with Upsert? [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1967830614

   @rohityadav1993 That shouldn't be the case if you have explicit partitioning enabled. Take a look at this [example](https://docs.pinot.apache.org/basics/data-import/upsert#example), specifically the `segmentPartitionConfig`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Is pinot possible to support batch Ingestion with Upsert? [pinot]

Posted by "shuitai (via GitHub)" <gi...@apache.org>.
shuitai commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1962762566

   @chenboat 
   
   If there are duplicated records in batch ingestion, upsert feature could dedup them. 
   If overwriting batch data,  the client needs to dedup by spark, flink or other tools. 
   The upsert is the most advantage of pinot compared to druid, if do upsert in pinot, it could help customer save ETL costs.
   So upsert feature in batch is also very useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on issue #10896: Is pinot possible to support batch Ingestion with Upsert?

Posted by "chenboat (via GitHub)" <gi...@apache.org>.
chenboat commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1603445829

   @Jackie-Jiang I do not understand the full context here. @shuitai You may need to explain more. Does batch ingestion mean Pinot offline table batch ingestion? If it is so, Pinot offline table update is done through segment upload. You can already overwrite an existing segment. Why do we need extra feature here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Is pinot possible to support batch Ingestion with Upsert? [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1970060654

   Actually the exception was thrown by `RealtimeTableDataManager.downloadSegmentFromPeer()` which shouldn't be invoked at all. I guess what have happened is some other exception was thrown when `downloadSegmentFromDeepStore()`, and then it somehow fallback to the peer download which is not possible for offline segments. Can you search for log of `Download segment {} from deepstore uri {} failed.`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Is pinot possible to support batch Ingestion with Upsert? [pinot]

Posted by "rohityadav1993 (via GitHub)" <gi...@apache.org>.
rohityadav1993 commented on issue #10896:
URL: https://github.com/apache/pinot/issues/10896#issuecomment-1965753872

   The segments are partitioned similarly how the stream is pre-partitioned during realtime ingestion. The challenge is validations that are in place, segments have to follow a naming convention for reatime tables and there needs to be offset info present when loading the segment.
   
   Should we create a new type for segments which implies externally uploaded segments for realtime tables and have separate validations for them in addSegment flow?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org