You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/02/06 00:35:02 UTC

[GitHub] [pinot] kkrugler opened a new issue #8141: Support pushFileNamePattern job conf option

kkrugler opened a new issue #8141:
URL: https://github.com/apache/pinot/issues/8141


   We do daily builds of offline segments using Hadoop, and then store the results in HDFS in the directory that is configured as our Pinot cluster’s deep store. Our build generates 35 new (or more typically updated) per-month segments each day, which we then deploy to our Pinot cluster via a metadata push.
   
   What this means is that we’ve got a deep store directory in HDFS with ≈ 1200 segments (representing 3 years of data) for a table. When we do the metadata push every segment is downloaded, metadata is extracted, and that metadata tarball is sent to the controller. This takes about 3 hours currently. But we only want to send the 35 new segments.
   
   It seems like a simple solution would be to support a new, optional `pushFileNamePattern` parameter in the job conf, which could be used to filter down to only the segments we care about. The format could be the same as the existing `includeFileNamePattern` pattern.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1037272894


   Ah yes `metadata` push internally pushes `uri + metadata`. I am still unsure on why it takes 3 hours. When you say every segment is downloaded and metadata is extracted, are you referring to the push job, or the controller download. If former, the segment generation part could provide the metadata as well (to avoid download and extract). If latter, controller should not be downloading, that would be a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1037272894






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kkrugler closed issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
kkrugler closed issue #8141:
URL: https://github.com/apache/pinot/issues/8141


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kkrugler commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
kkrugler commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1037430769


   Re segment download & metadata extraction, yes that's happening where the push job is running. I've got a separate (in progress) PR that does a streaming extract of just the two files needed for metadata (and ensures these files are first in the segment tarball), which does speed things up, but not as much as I'd expected.
   
   So there might very well be a bug where the controller, or some other part of the system, is doing unneeded work...or there's a bottleneck. I guess I could run our controller w/trace level logging and then do the push job to get an idea of where the time is going.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kkrugler commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
kkrugler commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1037430769


   Re segment download & metadata extraction, yes that's happening where the push job is running. I've got a separate (in progress) PR that does a streaming extract of just the two files needed for metadata (and ensures these files are first in the segment tarball), which does speed things up, but not as much as I'd expected.
   
   So there might very well be a bug where the controller, or some other part of the system, is doing unneeded work...or there's a bottleneck. I guess I could run our controller w/trace level logging and then do the push job to get an idea of where the time is going.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kkrugler closed issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
kkrugler closed issue #8141:
URL: https://github.com/apache/pinot/issues/8141


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1036756463


   I am curious why does metadata push take 3 hours? Are you not using the metadtata + uri push?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kkrugler commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
kkrugler commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1039819790


   Closed via https://github.com/apache/pinot/commit/f12e62522b8d09c8525a7482c92141042e3155c2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1037590394


   Thanks for confirming @kkrugler, and also for the fixes. Yeah, it would be great if we can identify the bottleneck, so we can fix that too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kkrugler commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
kkrugler commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1036831250


   I don't know of any `metadtata + uri push` option - just `metadata` or `uri` (or `tar`). We're using the metadata push, which should be most efficient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kkrugler commented on issue #8141: Support pushFileNamePattern job conf option

Posted by GitBox <gi...@apache.org>.
kkrugler commented on issue #8141:
URL: https://github.com/apache/pinot/issues/8141#issuecomment-1039819790


   Closed via https://github.com/apache/pinot/commit/f12e62522b8d09c8525a7482c92141042e3155c2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org