You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/05/30 03:10:53 UTC

Apache Pinot Daily Email Digest (2022-05-29)

### _#general_

  
 **@karthik.challa:** @karthik.challa has joined the channel  
 **@kartik.anand:** @kartik.anand has joined the channel  

###  _#random_

  
 **@karthik.challa:** @karthik.challa has joined the channel  
 **@kartik.anand:** @kartik.anand has joined the channel  

###  _#troubleshooting_

  
 **@karthik.challa:** @karthik.challa has joined the channel  
 **@diogo.baeder:** Hi guys! Is there a way to define the batch ingestion
command to pull a job spec file from S3 instead of the local filesystem?  
**@diogo.baeder:** The reason why I asked this is because in my project I'm
currently generating segments via the `/ingestFromURI` endpoint, which is not
recommended for production because it uses the Controller to do the hard work,
so I started transitioning to using normal ingestion jobs. However, they're
ad-hoc (I generate segments in a way that's more controlled by my
application), where each job should trigger the ingestion of a single segment,
and for this to work I have to put the job spec somewhere. I'd like this
"somewhere" to be S3, so that I don't need to mount an EFS volume in my
containers just for the job spec files.  
 **@diogo.baeder:** Another question (related to the previous one): after I
trigger a `LaunchDataIngestionJob` job with a file spec, for how long do I
need to keep that job spec file around? Can it be deleted right after the job
finishes, if I downloaded it from somewhere before I triggered the job?  
**@jadami:** you can just make a tmp file, no need to keep it around. It gets
logged I believe. In scala something like ```val tmpFile =
File.createTempFile("jobSpec", ".yaml") tmpFile.deleteOnExit()``` works well  
**@jadami:** i know you said you want to use s3 in your previous thread, but
1) i’m not sure if you can, and 2) the host should at least have a tmp dir
available  
**@diogo.baeder:** Ah, you gave me an idea: I can have a script that generates
the job spec file, triggers the job and then deletes the file right after the
job finishes. Thanks!  
**@ken:** That’s basically what we do, via a (Java) tool that creates the job
spec file based on CLI parameters. But we just put it in temp storage (same as
what @jadami suggested), so no need to delete it.  
**@diogo.baeder:** Got it. I'll delete the spec later because I'll put it in a
tmpfs mount, to avoid losing sight of disk space usage.  
 **@kartik.anand:** @kartik.anand has joined the channel  

###  _#getting-started_

  
 **@karthik.challa:** @karthik.challa has joined the channel  
 **@kartik.anand:** @kartik.anand has joined the channel  

###  _#introductions_

  
 **@karthik.challa:** @karthik.challa has joined the channel  
 **@kartik.anand:** @kartik.anand has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org