You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/05/30 03:10:53 UTC
Apache Pinot Daily Email Digest (2022-05-29)
### _#general_
**@karthik.challa:** @karthik.challa has joined the channel
**@kartik.anand:** @kartik.anand has joined the channel
### _#random_
**@karthik.challa:** @karthik.challa has joined the channel
**@kartik.anand:** @kartik.anand has joined the channel
### _#troubleshooting_
**@karthik.challa:** @karthik.challa has joined the channel
**@diogo.baeder:** Hi guys! Is there a way to define the batch ingestion
command to pull a job spec file from S3 instead of the local filesystem?
**@diogo.baeder:** The reason why I asked this is because in my project I'm
currently generating segments via the `/ingestFromURI` endpoint, which is not
recommended for production because it uses the Controller to do the hard work,
so I started transitioning to using normal ingestion jobs. However, they're
ad-hoc (I generate segments in a way that's more controlled by my
application), where each job should trigger the ingestion of a single segment,
and for this to work I have to put the job spec somewhere. I'd like this
"somewhere" to be S3, so that I don't need to mount an EFS volume in my
containers just for the job spec files.
**@diogo.baeder:** Another question (related to the previous one): after I
trigger a `LaunchDataIngestionJob` job with a file spec, for how long do I
need to keep that job spec file around? Can it be deleted right after the job
finishes, if I downloaded it from somewhere before I triggered the job?
**@jadami:** you can just make a tmp file, no need to keep it around. It gets
logged I believe. In scala something like ```val tmpFile =
File.createTempFile("jobSpec", ".yaml") tmpFile.deleteOnExit()``` works well
**@jadami:** i know you said you want to use s3 in your previous thread, but
1) i’m not sure if you can, and 2) the host should at least have a tmp dir
available
**@diogo.baeder:** Ah, you gave me an idea: I can have a script that generates
the job spec file, triggers the job and then deletes the file right after the
job finishes. Thanks!
**@ken:** That’s basically what we do, via a (Java) tool that creates the job
spec file based on CLI parameters. But we just put it in temp storage (same as
what @jadami suggested), so no need to delete it.
**@diogo.baeder:** Got it. I'll delete the spec later because I'll put it in a
tmpfs mount, to avoid losing sight of disk space usage.
**@kartik.anand:** @kartik.anand has joined the channel
### _#getting-started_
**@karthik.challa:** @karthik.challa has joined the channel
**@kartik.anand:** @kartik.anand has joined the channel
### _#introductions_
**@karthik.challa:** @karthik.challa has joined the channel
**@kartik.anand:** @kartik.anand has joined the channel
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org