You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/07/05 02:00:19 UTC

Apache Pinot Daily Email Digest (2021-07-04)

### _#general_

  
 **@knowledgeisstrengthfo:** @knowledgeisstrengthfo has joined the channel  
 **@knowledgeisstrengthfo:** Hi Everyone, We are evaluating Apache Pinot for
our analytical use case. We have encountered some scenarios for which we
didn't get proper justification yet. Please help us to understand the
reasoning behind them & how to address those scenarios; 1\. Why insert to
Pinot table via Presto connector is not supported as almost all other SQL
commands are supported ? 2\. Why updating records using update query is not
allowed on Pinot table via Presto ? 3\. If we want to replicate same set of
data values in a Pinot table how to do it at present without Kafka Ingestion ?
Ex: Existing 1M records we want to multiple by insert into TableA ( select *
from TableA ). As Presto connector not allowed to insert into table and Pinot
itself doesn't support subqueries, hence those 2 options are not there. 4\. If
we made some mistake adding column name during schema creation, and later
update the schema, will the previous ingested data values for that column will
automatically considered ? Ex: Realtime table has 1 column called "NAME",
which is supposed to be mentioned as "name". So as the Kafka stream data,
previously ingested have values for "name" attribute, so after schema change
will Pinot automatically update values for all rows or we need to retrofit
"name" values again ? If need to retrofit, what is the best possible way ? 5\.
Can a single query read from both REALTIME & OFFLINE tables ? As subqueries &
joins are not supported directly by Pinot, is there any way, we can achieve
that ?  
**@mayanks:** ```1. Ingestion in Pinot has traditionally been via offline and
realtime stream. Could you elaborate your usecase that requires insert of rows
via Presto? 2\. Upsert in Pinot is a newer feature and requires a primary key
to identify a row to be updated. While we may definitely explore update via
Presto, it might still be primary key based (as opposed to any generic
condition). 3\. If you don't want to use Kafka ingestion, you can push the
data via offline pipeline. 4\. Schema changes have to be backward compatible,
which your example isn't. 5\. Offline and realtime tables are internal to
Pinot. Client side only sees a single hybrid table, and Pinot answers query
including the offline and realtime data.```  
**@xiangfu0:** For presto Pinot integration, we only connect the query path.
No support for table and data ops.  
**@mayanks:** @knowledgeisstrengthfo Based on your questions, I am really
curious on what your use case is, and how you are trying to use Pinot. Could
you please share some details about that?  

###  _#random_

  
 **@knowledgeisstrengthfo:** @knowledgeisstrengthfo has joined the channel  

###  _#troubleshooting_

  
 **@azri:** Hi I try to push data from GCS to Pinot, after submitting job it
seem not doing any and no output at all, these are my job spec
```executionFrameworkSpec: name: 'standalone'
segmentGenerationJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName:
'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndUriPush inputDirURI: '' outputDirURI: '/tmp/ais-
pinot/sentences/' includeFileNamePattern: 'glob:**/**.parquet'
overwriteOutput: true pinotFSSpecs: \- scheme: file className:
org.apache.pinot.spi.filesystem.LocalPinotFS \- scheme: gs className:
org.apache.pinot.plugin.filesystem.GcsPinotFS configs: projectId: 'aton-
analytics' gcpKey: '/var/pinot/controller/config/gcs-datalake-key.json'
recordReaderSpec: dataFormat: 'parquet' className:
'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader' tableSpec:
tableName: 'sentence' pinotClusterSpecs: \- controllerURI: ''```  
**@ken:** This looks odd to me `includeFileNamePattern: 'glob:**/**.parquet'`.
I think it should be `includeFileNamePattern: 'glob:**/*.parquet'`  
**@azri:** I tried that one before, but same no output.  
**@azri:** Is it because the data was too big?  
 **@knowledgeisstrengthfo:** @knowledgeisstrengthfo has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org