You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2022/06/07 02:59:42 UTC

Apache Pinot Daily Email Digest (2022-06-06)

### _#general_

  
 **@nimrod.raifer:** @nimrod.raifer has joined the channel  
 **@vuppala.kumar:** @vuppala.kumar has joined the channel  
 **@vuppala.kumar:** Hi Pinot Team, Do you know how to create, rename, drop
column name of existing table. It is tedious task always to drop the existed
table and to create new table for altering the column names.  
**@vuppala.kumar:** @mayanks @kharekartik any idea on this?  
**@kharekartik:** Hi, you edit the schema from the UI itself. You can add
columns columns. Rename and Drop is not supported since the schema should be
backward compatible.  
**@mayanks:** Yes, schema evolution does not require deleting and recreating
table.  
**@hamed.ahlesaadat:** @hamed.ahlesaadat has joined the channel  
 **@vuppala.kumar:** Hi Pinot community, Can we copy data of one table to new
table?  
**@mark.needham:** You can, by copying the segments across. I wrote a blog
showing a small example of how to do this -  
**@seshasendhil:** @seshasendhil has joined the channel  
 **@kaiqueras:** @kaiqueras has joined the channel  
 **@max:** @max has joined the channel  

###  _#random_

  
 **@nimrod.raifer:** @nimrod.raifer has joined the channel  
 **@vuppala.kumar:** @vuppala.kumar has joined the channel  
 **@hamed.ahlesaadat:** @hamed.ahlesaadat has joined the channel  
 **@seshasendhil:** @seshasendhil has joined the channel  
 **@kaiqueras:** @kaiqueras has joined the channel  
 **@max:** @max has joined the channel  

###  _#troubleshooting_

  
 **@nimrod.raifer:** @nimrod.raifer has joined the channel  
 **@vuppala.kumar:** @vuppala.kumar has joined the channel  
 **@alihaydar.atil:** Hello everyone, Is there a reason why
GroovyFunctionEvaluator returns null on bindings with null values? Would it
cause any side effects to run the script with null bindings? Thanks in advance  
**@mark.needham:** @npawar probably knows best about this.  
**@npawar:** No particular reason. we were just being defensive. this has been
changed in the recent master  
**@alihaydar.atil:** Thanks for the answer :+1:  
 **@hamed.ahlesaadat:** @hamed.ahlesaadat has joined the channel  
 **@tommaso.peresson:** Hi everybody, I have a question for you. I have a
table/schema configured like: ```{ "OFFLINE": { "tableName":
"DailyUniqHll_OFFLINE", "tableType": "OFFLINE", "segmentsConfig": {
"timeType": "DAYS", "retentionTimeUnit": "DAYS", "retentionTimeValue": "365",
"replication": "1", "timeColumnName": "partition", "allowNullTimeValue": false
}, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" },
"tableIndexConfig": { "enableDefaultStarTree": false, "starTreeIndexConfigs":
[ { "dimensionsSplitOrder": [ "partition", "fields.1", "fields.2", "fields.3",
"fields.4", "fields.5", "fields.6", "fields.7", "fields.8", "fields.9" ],
"functionColumnPairs": [ "SUM__counters.c", "DISTINCTCOUNTHLL__hllState" ],
"maxLeafRecords": 1000 } ], "enableDynamicStarTreeCreation": true,
"aggregateMetrics": false, "nullHandlingEnabled": false, "rangeIndexVersion":
2, "autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false }, "metadata": {},
"ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType":
"APPEND", "segmentIngestionFrequency": "DAILY" }, "complexTypeConfig": {
"fieldsToUnnest": [ "fields", "counters" ], "delimiter": ".",
"collectionNotUnnestedToJson": "NON_PRIMITIVE" } }, "isDimTable": false } }```
Schema: ```{ "schemaName": "ViewElementDailyUniqHll", "dimensionFieldSpecs": [
{ "name": "fields.1", "dataType": "STRING" }, { "name": "fields.2",
"dataType": "STRING" }, { "name": "fields.3", "dataType": "STRING" }, {
"name": "fields.4", "dataType": "STRING" }, { "name": "fields.5", "dataType":
"STRING" }, { "name": "fields.6", "dataType": "STRING" }, { "name":
"fields.7", "dataType": "STRING" }, { "name": "fields.8", "dataType": "STRING"
}, { "name": "fields.9", "dataType": "STRING" }, { "name": "cubeName",
"dataType": "STRING" }, { "name": "list", "dataType": "LONG",
"singleValueField": false }, { "name": "hllState", "dataType": "BYTES" }, {
"name": "counters.c", "dataType": "INT" } ], "dateTimeFieldSpecs": [ { "name":
"partition", "dataType": "STRING", "format":
"1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity": "1:DAYS" } ] }```
When I ingest some data I get a ~10x size increase because of
`DISTINCTCOUNTHLL__hllState` in the star tree index. Is this expected? Is
there something misconfigured?  
**@mark.needham:** do you mean 10x more than if that field isn't included in
the index?  
**@mayanks:** What does hll_state contain? From your config, another HLL will
be created where hll_state is the element in that set. Is that what you
intend?  
**@tommaso.peresson:** > do you mean 10x more than if that field isn't
included in the index? yes  
**@tommaso.peresson:** hll state contains a bytes array representing the pre-
estimation state of the HLL algorithm.  
**@g.kishore:** Which column is HLL representing?  
**@mayanks:** It is the hll_state column which is serlaized HLL, from what I
understand @g.kishore.  
**@mayanks:** You do have split on several dimensions and a low max leaf
record value, that may be contributing to some (if not all).  
**@g.kishore:** The size increase basically means that one of the fields 1 to
9 have very high cardinality  
**@g.kishore:** And there is not much aggregation happening when the star tree
index is created  
**@luisfernandez:** hey friends, it’s me again, this time around with a
question around partitions, we have an offline job that uploads data to pinot
that we are testing in our sandbox env to the offline tables, when that data
is ingested and i look at the segments it generates i see the partitions being
created like this: in the metadata from the ui
```{\"numPartitions\":8,\"partitions\":[0,1,2,3,4,5,6,7]``` however in our
prod system, which has a hybrid setup I always see one number in the
partitions column, ```{\"numPartitions\":8,\"partitions\":[1]``` is this
something I should be concerned about?  
 **@mayanks:** Prod looks good, dev is not partitioned  
**@luisfernandez:** does that mean that we have to partition the data better?  
**@luisfernandez:** does this impact performance?  
**@mayanks:** It means your data is not seen as partitioned by Pinot, it will
help to fix that if you want thousands of read qps  
**@karthik.varagini:** , facing some problem to load the data to an existing
offline table data getting overwrite iam using the following command, If i
tried to load the new data I'm loosing the old data any suggestions ``` sudo
docker run --rm -ti \ \--network=pinot-demo_default \ -v
/home/XXXX/dna/pinot/lookup2/pinot-quick-
start:/home/XXXX/dna/pinot/lookup2/pinot-quick-start \ \--name pinot-batch-
table-creation \ apachepinot/pinot:latest AddTable \ -schemaFile
/home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-schema.json \
-tableConfigFile /home/XXXX/dna/pinot/lookup2/pinot-quick-start/orders-table-
offline.json \ -controllerHost manual-pinot-controller \ -controllerPort 9000
-exec sudo docker run --rm -ti \ \--network=pinot-demo_default \ -v
/home/XXXX/dna/pinot/lookup/pinot-quick-
start:/home/XXXX/dna/pinot/lookup/pinot-quick-start \ \--name pinot-data-
ingestion-job \ apachepinot/pinot:latest LaunchDataIngestionJob \ -jobSpecFile
/home/XXXX/dna/pinot/lookup/pinot-quick-start/docker-job-spec.yml```  
**@luisfernandez:** I wonder if this makes it override `overwriteOutput: true`
in your docker-job-spect copy.yml  
**@karthik.varagini:** sorry i have attached wrong file ... here is the actual
one  
**@troy:** @karthik.varagini friendly reminder, please do not use `@-here`. We
want to be respectful of everyone in this community slack channel.  
**@karthik.varagini:** ohk @troy  
**@mark.needham:** I think the new data is likely creating a segment with the
same name as the old data  
**@mark.needham:** is there a timestamp as one of your columns?  
**@mark.needham:** that's probably the easiest way to ensure a unique segment
name  
**@karthik.varagini:** thanks @mark.needham, yes, both the segments are
creating with same name... I tried by proving the following details , It
solved my problem... thanks a ton ```segmentNameGeneratorSpec: type: fixed
configs: segment.name: 'orders1'```  
 **@karthik.varagini:** my docker spec  
 **@mathieu.druart:** Hello everyone, I have an offline Pinot table with a
STRING multi valued column and when I try this request : ```select distinct
myMultiValuedColumn from MyTable where otherColumn in ('MY_VALUE') limit
1000``` I have this error : ``` "message":
"QueryExecutionError:\njava.lang.UnsupportedOperationException\n\tat
org.apache.pinot.segment.spi.index.reader.ForwardIndexReader.readDictIds(ForwardIndexReader.java:84)\n\tat
org.apache.pinot.core.common.DataFetcher$ColumnValueReader.readDictIds(DataFetcher.java:418)\n\tat
org.apache.pinot.core.common.DataFetcher.fetchDictIds(DataFetcher.java:89)\n\tat
org.apache.pinot.core.common.DataBlockCache.getDictIdsForSVColumn(DataBlockCache.java:109)",
"errorCode": 200``` If I remove the distinct or the where clause, I have no
issue. Am I missing something ? Thank you !  
 **@seshasendhil:** @seshasendhil has joined the channel  
 **@kaiqueras:** @kaiqueras has joined the channel  
 **@max:** @max has joined the channel  

###  _#getting-started_

  
 **@nimrod.raifer:** @nimrod.raifer has joined the channel  
 **@vuppala.kumar:** @vuppala.kumar has joined the channel  
 **@hamed.ahlesaadat:** @hamed.ahlesaadat has joined the channel  
 **@seshasendhil:** @seshasendhil has joined the channel  
 **@kaiqueras:** @kaiqueras has joined the channel  
 **@max:** @max has joined the channel  

###  _#introductions_

  
 **@nimrod.raifer:** @nimrod.raifer has joined the channel  
 **@vuppala.kumar:** @vuppala.kumar has joined the channel  
 **@hamed.ahlesaadat:** @hamed.ahlesaadat has joined the channel  
 **@karinwolok1:** :wave: Please help us welcome to all the new Pinot
community members! :wine_glass: We're growing so fast!!! *Would love to know
who you are, how you discovered Pinot,* :pinot: *and what brought you here!*
:heart: @sandeep278 @abhiram.p @harshvardhanc @jkylling @iamtherealdarknight
@nimrod.raifer @vuppala.kumar @hamed.ahlesaadat @sowmya.gowda
@karangisreekanth @dave.deep @xiaoyzhu @jaimin @mehmet.tasan @jag959
@pj.kovanen @gunnar.enserro @hareesh.lakshminaraya @rafael.moreno @acching
@dangngoctan2012 @priya.shivakumar @arnaud.zdziobeck @teehan @gaetanmorlet
@matthew @tommaso.peresson @cesaro.angelo @ghita.saouir @m.ram3sh @sonam.dp42
@archetana @gstein @lukas @kevin.peng @csmithson @fb @yanghao @valdamarin.d
@jorick @justin @s.himadri @ahmadreza @attaraas @karthik.challa @kartik.anand
@karthik.varagini @pravin.bange1989 @joe.padamadan @jacob.branch
@piercarlo.paltro @alex.gartner @madison.s204 @wadodkar @kevin.kamel @carolyn
@hui @kingkenway16 @rsohlot @vishnu @visar @rino @adamkeane @dsipple
@kearn.kirkwood @richard.bair @marc.kriguer @kozdemir @fritz.wijaya @jmoots
@sderegt838 @horaymond6 @randika @zotyarex  
 **@seshasendhil:** @seshasendhil has joined the channel  
 **@kaiqueras:** @kaiqueras has joined the channel  
 **@max:** @max has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org