You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pinot.apache.org by Pinot Slack Email Digest <ap...@gmail.com> on 2021/08/20 02:00:24 UTC

Apache Pinot Daily Email Digest (2021-08-19)

### _#general_

**@albertopang:** @albertopang has joined the channel
**@sidarthar:** @sidarthar has joined the channel
**@stuart.edgington:** @stuart.edgington has joined the channel
**@chxing:** Hi All, Need we add index (like inverted index) for time column
or time column already have index itself? thx
**@ken:** I think that if the entries in the time column are sorted, Pinot
will figure that out and automatically add a sorted index.
**@mayanks:** yes, if a column is sorted Pinot will identify that and add the
sorted column. Typically, you don't need to set inv index on time column,
either because it is already sorted, or it is partitioned enough (naturally)
that Pinot can prune out rows without the need of explicit index.
**@chxing:** Thx @ken @mayanks :grinning:
**@cjfudge:** @cjfudge has joined the channel
**@rgoyal2191:** @rgoyal2191 has joined the channel
**@nawazshahm:** @nawazshahm has joined the channel

### _#random_

**@albertopang:** @albertopang has joined the channel
**@sidarthar:** @sidarthar has joined the channel
**@stuart.edgington:** @stuart.edgington has joined the channel
**@cjfudge:** @cjfudge has joined the channel
**@rgoyal2191:** @rgoyal2191 has joined the channel
**@nawazshahm:** @nawazshahm has joined the channel

### _#troubleshooting_

**@albertopang:** @albertopang has joined the channel
**@deemish2:** Hi , I am testing retention period in realtime as well as
offline table , with “retentionTimeValue”: 1 and “retentionTimeUnit”: “HOURS”.
In that case , data should be deleted after 1 hour from this table . But I can
see the data still after 2 hours
**@xiangfu0:** pinot retention manager kicks off by default 6 hours I think
you can config it in controller: See here :
**@deemish2:** Thanks Xiang
**@bajpai.arpita746462:** Hi Everyone, I am also trying the retention period
in realtime table, with retention period of 1 hour. But i can still see the
segments post 1 hour of creation. In the screenshot below , the segment got
created at 11:30 am today but it is still present
**@xiangfu0:**
**@sidarthar:** @sidarthar has joined the channel
**@syedakram93:** is there any option to specify to reload segments in
parallel?
**@ken:** I haven’t seen any such option. Though each server will be re-
loading segments in parallel. Also the low-level code loads segments in
response to messages received - but I don’t know if that message handling is
done in parallel (threaded). Maybe @jackie.jxt or @g.kishore could comment
here? :slightly_smiling_face:
**@g.kishore:** segment reload done in parallel. you can control it using some
low level Helix config dynamically
**@jackie.jxt:** @syedakram93 Segment reload on each server is sequential, and
it is kind of intentional because loading in parallel can take to much
resources while server still need to serve queries. Generating indexes on
multiple segments in parallel can also cause memory issue
**@g.kishore:** my bad, I thought Helix messages are processed in parallel.
@jackie.jxt are we intentionally making it single threaded?
**@jackie.jxt:** @g.kishore For whole table reload, it is a single message per
server. We make it single threaded intentional because of the risks described
above. We can add an option into the Helix message to control the parallelism,
but users need to understand the side effect of it
**@ken:** Hi @jackie.jxt - I see
`SegmentFetcherAndLoader.addOrReplaceOfflineSegment()`, which I thought was
how segments got loaded. But that seems to be called by a msg that’s
processing a single segment, not all segments for the server.
**@g.kishore:** got it. we should definitely create an issue.. someone might
be able to make it multi-threaded and by default numThreads can still be 1
**@ken:** @g.kishore Agreed. For example, our client’s cluster is small (6-8
servers) but they all are 32 core/128GB, so beefy enough to handle multiple
downloads in parallel. And we pre-build the segment indexes in a Hadoop job,
so that reduces the CPU & memory impact during segment loading.
**@jackie.jxt:** @ken I think we are discussing 2 different things here. So
there are 2 scenarios: 1\. Server restart - segments are loaded via the Helix
state transition, which happens in parallel and can be configured via Helix
config (by default 40 threads) 2\. Manual triggered reload when index config
is updated in table config - sequential because it requires adding index on
the fly
**@jackie.jxt:** So basically we want to add an option to use multiple threads
for the second scenario
**@jackie.jxt:** Created an issue to track this:
**@ken:** @jackie.jxt so when I load say 1000 segments for a new offline table
(not server restart), I assume that’s another situation where helix state
transition msgs are processed in parallel, right?
**@jackie.jxt:** Yes, new segments are also processed via the helix state
transition
**@syedakram93:** like no. of threads (num of segments)
**@stuart.edgington:** @stuart.edgington has joined the channel
**@cjfudge:** @cjfudge has joined the channel
**@cjfudge:** Hello - It appears that Pinot uses an empty kafka consumer
group id (low level consumer) - however I think this is being deprecated from
a kafka perspective as I see this in the kafka log... will this be a problem ?
> Support for using the empty group id by consumers is deprecated and will be
removed in the next major release
**@g.kishore:** I dont think this will be a problem.. Pinot does not rely on
kafka consumer group. We probably used empty group id because of the api
available at that time. we should be able to use the new api without change in
functionality.
**@anusha.munukuntla:** Hi, I am trying to rotate the logs in pinot, this the
log4j file which I am using, but logs are not rotating, someone could please
help me out with it
**@ken:** I assume you’ve checked the various Pinot process logs (controller,
broker, server) and you don’t see any Log4J-related errors or warnings at
startup, right?
**@rgoyal2191:** @rgoyal2191 has joined the channel
**@nawazshahm:** @nawazshahm has joined the channel
**@qianbo.wang:** Hi, on this , it suggests to check server logs for reasons
of “table being in bad state”. Can anyone help and specify which server logs I
should look into (i.e. broker, controller, etc)? And is searching for the
table name sufficient to find the error? or any pattern would work? Thanks in
advance
**@mayanks:** Hello, server here implies `pinot-server`. You can grep for the
segment name which is in the bad state.
**@qianbo.wang:** Will try. Thanks!
**@chxing:** Hi. I found there many long suffix segments in HDFS (as deep
storage in pinot) , Is that means those segments are failed in deep storage?
thx
**@chxing:**

### _#pinot-dev_

**@rgoyal2191:** @rgoyal2191 has joined the channel
**@grace.walkuski:** Hi! I’m using the pinot jdbc client and I’m getting a
timeout: ```java.util.concurrent.ExecutionException:
java.util.concurrent.ExecutionException:
java.util.concurrent.TimeoutException: Request timed out to {broker url} of
60000 ms at
org.apache.pinot.client.JsonAsyncHttpPinotClientTransport$BrokerResponseFuture.get(JsonAsyncHttpPinotClientTransport.java:173)
~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at
org.apache.pinot.client.JsonAsyncHttpPinotClientTransport$BrokerResponseFuture.get(JsonAsyncHttpPinotClientTransport.java:152)
~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at
org.apache.pinot.client.JsonAsyncHttpPinotClientTransport$BrokerResponseFuture.get(JsonAsyncHttpPinotClientTransport.java:123)
~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at
org.apache.pinot.client.JsonAsyncHttpPinotClientTransport.executeQuery(JsonAsyncHttpPinotClientTransport.java:102)
~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at org.apache.pinot.client.Connection.execute(Connection.java:127) ~[pinot-
java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.client.Connection.execute(Connection.java:96) ~[pinot-java-
client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6] at
org.apache.pinot.client.PreparedStatement.execute(PreparedStatement.java:72)
~[pinot-java-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at
org.apache.pinot.client.PinotPreparedStatement.executeQuery(PinotPreparedStatement.java:193)
~[pinot-jdbc-client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
at
org.apache.pinot.client.PinotPreparedStatement.execute(PinotPreparedStatement.java:160)
~[pinot-jdbc-
client-0.7.1.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]``` Where is
the `60000` ms being set? Can I increase it? Thanks!
**@xiangfu0:** I think this is set at broker/server side.
pinot.broker.timeoutMs pinot.server.query.executor.timeout
**@grace.walkuski:** Gotcha, thanks!
**@grace.walkuski:** when i run the same query directly against the databse
via the pinot ui, it runs and takes more than 60000ms. if its set at the
broker level, wouldn’t it timeout there too?
**@ken:** Hi @xiangfu0 - isn’t this error coming from the `AsyncHttpClient`,
which is being used by `JsonAsyncHttpPinotClientTransport`? If so, then the
only way I see of changing the connection timeout is via system properties,
e.g.
`-Dcom.ning.http.client.AsyncHttpClientConfig.defaultConnectionTimeoutInMS=120000`,
but I haven’t tried that. (also defaultRequestTimeoutInMS, I think)
**@xiangfu0:** ui side timeout parameter is carried with the query it self, so
both broker and server side will override it for that query
**@xiangfu0:** hmmm
**@xiangfu0:** ah, you mean it’s http timeout
**@xiangfu0:** not the query timeout
**@ken:** I think so, based on the stack trace
**@xiangfu0:** hmmm
**@xiangfu0:**
**@xiangfu0:** from code, client side set 1000 days as timeout
**@xiangfu0:** in `JsonAsyncHttpPinotClientTransport.java`
**@ken:** I think that’s the timeout for how long the `BrokerResponseFuture`
will wait for the HTTP client to return a result (essentially unbounded). But
I think the HTTP client is throwing the timeout exception here.

### _#getting-started_

**@mosiac:** @mosiac has joined the channel
**@stuart.edgington:** @stuart.edgington has joined the channel

### _#releases_

**@stuart.edgington:** @stuart.edgington has joined the channel
**@rgoyal2191:** @rgoyal2191 has joined the channel
\--------------------------------------------------------------------- To
unsubscribe, e-mail: dev-unsubscribe@pinot.apache.org For additional commands,
e-mail: dev-help@pinot.apache.org