You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pinot.apache.org by Pinot Slack Email Digest <sn...@apache.org> on 2021/01/15 02:00:27 UTC

Apache Pinot Daily Email Digest (2021-01-14)

### _#general_

  
 **@myeole:** I am trying to fetch PARQUET files from s3 and load into pinot
DB. I am using offline table. I am running this command with my job spec
./bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile
examples/batch/metrics/ingestionJobSpec.yaml I am seeing the following errors,
any idea how to solve this issue ? Jan 13, 2021 6:34:24 PM WARNING:
org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by
could not be parsed (see PARQUET-251): parquet-mr
org.apache.parquet.VersionParser$VersionParseException: Could not parse
created_by: parquet-mr using format: (.+) version ((.*) )?\\(build ?(.*)\\) at
org.apache.parquet.VersionParser.parse(VersionParser.java:112) at
org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:567)
at
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:544)
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:431)
at
org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238)
at
org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:234)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPFailed to
generate Pinot segment for file -  java.lang.IllegalArgumentException: INT96
not yet implemented. at
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:251)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at
org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:236)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at
org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:222)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:235)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at
org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:215)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at
org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:209)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124)
~[pinot-all-0.7.0-SNAPSHOT-jar-with-
dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f]  
**@fx19880617:** The issue here is that Pinot using parquet avro lib to read
it, which doesn’t understand int96 type  
**@fx19880617:** Is it possible to convert it to int64?  
**@myeole:** you mean in table schema  
**@fx19880617:** Yes, if it’s still not working, then we may need some fix on
that to bypass int96  
**@myeole:** I changed to BIGINT in parquet and to unix timestamp in input
file. I am using long in my schema. but I am seeing this error any idea  
**@myeole:** @fx19880617 Failed to generate Pinot segment for file -
java.lang.IllegalStateException: Invalid segment start/end time:
5031-12-29T23:00:00.000Z/5032-01-01T11:00:00.000Z (in millis:
96627164400000/96627380400000) for time column: ingress_timestamp, must be
between: 1971-01-01T00:00:00.000Z/2071-01-01T00:00:00.000Z  
 **@humengyuk18:** @humengyuk18 has joined the channel  
 **@zxcware:** Hi team, is there a limit on number of znodes per parent node
in ZK today?  
**@g.kishore:** things typically start slowing down with ZK after you see
100's of thousands of ZNode  
**@g.kishore:** we have seen thousands of tables in production and it works
fine..  

###  _#random_

  
 **@humengyuk18:** @humengyuk18 has joined the channel  

###  _#troubleshooting_

  
 **@humengyuk18:** @humengyuk18 has joined the channel  
 **@valentin:** Hello, I’m having a weird query issue, when I try to query my
cluster (via Pinot UI) with: ```SELECT "tmpId" from
datasource_5ffdbf421eb80003001818fe WHERE "name" = "identify" AND "clientId" =
"ef8e0112fbac1450776931712bdaad3bb0deb121" GROUP BY "tmpId" LIMIT 1``` The
query is executed But with: ```SELECT "tmpId" from
datasource_5ffdbf421eb80003001818fe WHERE "name" = "identify" AND "clientId" =
"3f8e0112fbac1450776931712bdaad3bb0deb121" --
3f8e0112fbac1450776931712bdaad3bb0deb121 GROUP BY "tmpId" LIMIT 1``` I get the
following error: ```[ { "errorCode": 200, "message":
"QueryExecutionError:\norg.antlr.v4.runtime.misc.ParseCancellationException\n\tat
org.antlr.v4.runtime.BailErrorStrategy.recoverInline(BailErrorStrategy.java:66)\n\tat
org.antlr.v4.runtime.Parser.match(Parser.java:203)\n\tat
org.apache.pinot.pql.parsers.PQL2Parser.expression(PQL2Parser.java:828)\n\tat
org.apache.pinot.pql.parsers.PQL2Parser.expression(PQL2Parser.java:745)\n\tat
org.apache.pinot.pql.parsers.Pql2Compiler.parseToAstNode(Pql2Compiler.java:148)\n\tat
org.apache.pinot.pql.parsers.Pql2Compiler.compileToExpressionTree(Pql2Compiler.java:153)\n\tat
org.apache.pinot.common.request.transform.TransformExpressionTree.compileToExpressionTree(TransformExpressionTree.java:46)\n\tat
org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleSubquery(BaseBrokerRequestHandler.java:471)\n\tat
org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:215)\n\tat
org.apache.pinot.broker.api.resources.PinotClientRequest.processSqlQueryPost(PinotClientRequest.java:155)\n\tat
sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)\n\tat
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
java.lang.reflect.Method.invoke(Method.java:498)\n\tat
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)"
} ]``` I don’t really understand the error and why it’s happening, the only
thing that changes between 2 queries is the `clientId` value that starts with
`ef` in the first query and starts with `3f` in the 2nd one  
**@mayanks:** Any reason you are using PQL instead of SQL?  
**@valentin:** I’m using SQL (via the pinot ui, the HTTP call is ``)  
**@mayanks:** I see, the stack trace suggested otherwise, but that is for
transform, not the query. Not sure why this might happen just by changing two
characters. Can you try removing the quotes from the literals.  
**@mayanks:** Seems you have a typo in the second query? Check the clientId
predicate (it has — character)  
**@valentin:** You’re talking about the comment? I have the same issue without
it: ```SELECT tmpId from datasource_5ffdbf421eb80003001818fe WHERE name =
"identify" AND clientId = "3f8e0112fbac1450776931712bdaad3bb0deb121" GROUP BY
tmpId LIMIT 1```  
**@mayanks:** Ah, on the phone so didn’t see the syntax correctly. Seems like
a bug, could you file an issue  
**@g.kishore:** are you using pql or sql?  
**@mayanks:** SQL, however, from stack trace we see that we internally use PQL
for transforms  
**@mayanks:** FWIW, I can compile the query from IDE  
**@g.kishore:**
```org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleSubquery(BaseBrokerRequestHandler.java:471)```  
**@g.kishore:** there is no subquery here  
**@mayanks:** Yeah, noticed that too. But the code doesn't have 'if'  
**@g.kishore:** ok  
**@g.kishore:** @jackie.jxt ^^  
**@jackie.jxt:** @valentin Can you try single quote instead of double quote?
In SQL, double quote is for identifier, you need single quote for literal  
**@contact:** He opened an issue there:  
**@contact:** @jackie.jxt Indeed that works with single quote for the literal
value, thanks for the insight  
**@contact:** However not sure to get why it get sometimes "correctly"
interpreted  
**@jackie.jxt:** @contact IIRC, it is not interpreted correctly, but just not
throwing exception. `"name" = "identify"` will be interpreted into `name -
identify = 0`, where both `name` and `identify` are treated as identifier
(column)  
 **@amitchopra:** Hi, trying to troubleshoot an issue i am facing. I have a
K8S cluster setup with 4 server instances. For the server, i changed replicas
to 2 and did helm upgrade. Even though the servers in K8S has reduced from 4
to 2, i still see the deleted ones in bad state in pinot UI. Shouldn’t the
deleted servers from K8S be deleted from pinot as well? Secondly, the problem
i am facing is that 2 of the segments are mapped to the deleted servers. And
now it is not allowing me to drop the server instances manually too. And those
2 segments too are in bad state. Ideas?  
**@npawar:** you prolly need to untag the old instances, and do a rebalance.
that should move all segments to the live ones, and only then would you be
able to delete the instances  
**@npawar:** @fx19880617 are there any special considerations for K8 setup
other than this?  
**@npawar:**  
**@npawar:** guide for untag and rebalance ^^  
**@amitchopra:** got it. Thanks  
**@fx19880617:** by removing servers from k8s won't delete the pinot-server
automatically  
**@fx19880617:** delete pinot servers instances require manual operations  
**@amitchopra:** ok, thanks  

###  _#announcements_

  
 **@valentin:** @valentin has joined the channel  
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org