You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/10/29 02:00:18 UTC

Apache Pinot Daily Email Digest (2021-10-28)

### _#general_

  
 **@bcwong:** If the Server doesn’t have enough memory for all its segments,
does it offload some to disk? What’s the offload directory? Or should I turn
on swap + mmap, and let the OS deal with it? (Sorry I couldn’t find any
documentation for that.)  
**@g.kishore:** Pinot always offloads segments to disk and uses mmap (lets OS
deal with it). Only place where it needs enough memory is in real-time
consuming segments (even in that case, its only for inverted index)  
**@g.kishore:** the offload directory is the data.dir  
**@bcwong:** Awesome. Thanks!  
 **@diogo.baeder:** Hi folks! Simple question: is it OK to use the same Kafka
topic to process events that generate rows for multiple tables and schemas, in
a project? Suppose that I have a project, "MyProject", and then I have a
"users" table, a "pets" table and a "cars" table, supposing they're all
REALTIME, and then I want to create and use a "my_project" topic to publish
all these events. Would this be fine?  
**@g.kishore:** Yes but it will be inefficient since each table will consume
everything and then has to filter out  
**@g.kishore:** It’s ok for testing but would avoid in production  
**@diogo.baeder:** Ah, got it. Thanks man!  
**@diogo.baeder:** I'll use one for each table then.  
 **@jain.arpit6:** Hi, I have a STRING column product_str with json blob
stored. I have created an index on the column as mentioned in the doc. Sample
data from column looks like this:
{"fieldA":{"key1":"val1"},"fieldB":"something"} My query is Select ••• from
mytable where json_match(product_str, '"$.fieldB"=''somevalue''') I am not
getting any result back with above query. Any ideas ?  
**@mayanks:** Are you getting zero records back? Or the query gets stuck?  
**@jain.arpit6:** Its stuck  
 **@nair.a:** @nair.a has joined the channel  
 **@lalitbhagtani01:** Hi all, I am looking for thirdeye to be used with
pinot, but not getting any good resource. Whatever resource I have found out
is not useful. Any suggestion would be helpful.  
**@mayanks:** @pyne.suvodeep could you invite @lalitbhagtani01 to the TE
slack?  
**@pyne.suvodeep:** Hi @lalitbhagtani01 . Can you please share your email?  
**@abhishek:** Hi @pyne.suvodeep can you please invite me as well to the same.
I am sending my email ID on DM  
 **@nair.a:** Hi Team, I have few queries regarding Pinot hybrid table. 1)
Lets say, we have a primary key pk1, which is both available in realtime and
offline table, on query which table is be preferred by pinot i.e from which
table data will be shown in output? 2) Can i append "1 record" into existing
offline table? if yes , how soon it will be available to query? Thanks  
**@mayanks:** ```1. Pinot queries both the offline and realtime components for
specific time window. For example, it queries realtime table for the latest
data (say 1 day for example), and offline for rest. It is not a fucntion of
pk. 2\. Data ingestion to offline is at segment level and not record level.
For realtime, it is record level and the record is available as soon as it is
ingested inside of Pinot.```  
**@sanjay.a:** Hello Mayank, Thanks for the response. Me and anish are working
together on same product. I would like to add exact use case in anish's
question: If i ingest(via spark job) segments of older then 7 days in my
offline table and keep having latest in realtime(via kafka). My use case is
many time i need to update more then 7 days older data as well. here i can
just add that record(new version) in realtime table. In this case older state
of that record will be already in offline table and newer version will in
realtime. What will be final output in such scenario ?  
**@sanjay.a:** Anish's 2nd question : If i receive any older data then
realtime table recency and willing to append into segment, how to do this.
Currently we are using apache druid and willing to replace that for same
reason that we have to overwrite the entire segment even for just 1 record
append need.  
**@mayanks:** @sanjay.a For first question, by updating a record do you mean
mutating column values for a record identified by a primary key? If so, this
is called upsert in Pinot, and currently it works only if you have just the
realtime table.  
**@mayanks:** For 2: If you have a realtime only table, then older data can be
consumed without problem. If you have hybrid table, the older data still gets
ingested into Pinot, but if it is older than the time-boundary from offline
data, it is filtered out today.  
 **@sanjay.a:** @sanjay.a has joined the channel  
 **@javiervazquezh:** @javiervazquezh has joined the channel  
 **@karinwolok1:** :wave: Welcome to all the new Apache Pinot slack members
who joined us this month! :wine_glass: Would love to know what brought you
here, who you are, and how you found out about Apache Pinot! :heart: @sanjay.a
@javiervazquezh @nair.a @ruicong.xie @daniel.bos @atsushi.sakai
@soyinka.majumder @manoj.purohit @maximo.alves @leon.graveland @pranav.chawla
@anilkprabhala @yawei.li @sasha @stuart.millholland @rionmonster @jacob.medal
@ebyhry @derobj @greyson @shadab.anwar @navdeep @aconbol @bobby.richard
@pennylovema @abhishek @fritzb799 @jeffreyliu34 @sandeep.hadoopadmn
@girishpatel.techbiz @tyler773 @nicole @awadesh.kumar @very312 @arpitc0707
@devlearn75 @aylwin.souza @vinayv @bcwong @philippe.dooze @agsherrick
@jain.arpit6 @vaibhav.gupta @awadesh22kumar @piyush.chauhan @chad
@r.sachdeva9355 @jieshe @alihaydar.atil @valentin.richer @sudhakar.kamireddy
@robbiecomeau @zjureel @mmadou @mustafaf @helario @courage.noko
@singhal.prateek3 @benshahbaz @aabuda @yangguji @hardike @nhas3007 @nemanja
@lalitbhagtani01 @nkuptsov @talgab @tharun.3c @roland.vink @brian.brady
@otiennosharon @flagiron2 @qoega @nicolasdelffon @brunobrandaovn @nolefp
@suman @ss68374 @hristo @seabao @shubhamdhal @sabhi8226 @mbshrikanth @jsegall
@dongxiaoman @camerronadams  

###  _#random_

  
 **@nair.a:** @nair.a has joined the channel  
 **@sanjay.a:** @sanjay.a has joined the channel  
 **@javiervazquezh:** @javiervazquezh has joined the channel  

###  _#troubleshooting_

  
 **@nair.a:** @nair.a has joined the channel  
 **@sanjay.a:** @sanjay.a has joined the channel  
 **@javiervazquezh:** @javiervazquezh has joined the channel  
 **@abhishek:** any pointers ?  
**@g.kishore:** it might be a transient error.. we should probably add a
profile to build only pinot core modules with minimal extensions  
**@ken:** I see some people reporting this as an issue caused by the Confluent
Maven repo not working with mirroring. If that’s the issue for you, see  for
solutions, easiest might be to explicitly add to the pom.xml:
```<repositories> <repository> <id>confluent</id> <url></url> </repository>
</repositories>```  
 **@srirams.ganesh:** Hello - Has anyone tried to connect to Pinot from
Tableau using ?  

###  _#pinot-dev_

  
 **@pranav.chawla:** @pranav.chawla has joined the channel  
 **@atsushi.sakai:** @atsushi.sakai has joined the channel  
 **@dadelcas:** @g.kishore not sure if youve seen my previous message. I'm
going to raise a draft PR in the next couple of days with what alive got so
far so you and anyone else can feed back  
**@g.kishore:** Sure  
 **@daniel.bos:** @daniel.bos has joined the channel  
 **@agsherrick:** @agsherrick has joined the channel  

###  _#thirdeye-pinot_

  
 **@daniel.bos:** @daniel.bos has joined the channel  

###  _#getting-started_

  
 **@pranav.chawla:** @pranav.chawla has joined the channel  
 **@atsushi.sakai:** @atsushi.sakai has joined the channel  
 **@daniel.bos:** @daniel.bos has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org