You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Xinyao Hu (JIRA)" <ji...@apache.org> on 2014/04/18 01:42:14 UTC
[jira] [Created] (KAFKA-1403) Adding timestamp to kafka index
structure
Xinyao Hu created KAFKA-1403:
--------------------------------
Summary: Adding timestamp to kafka index structure
Key: KAFKA-1403
URL: https://issues.apache.org/jira/browse/KAFKA-1403
Project: Kafka
Issue Type: Improvement
Components: core
Affects Versions: 0.8.1
Reporter: Xinyao Hu
Right now, kafka doesn't have timestamp per message. It makes an assumption that all the messages in the same file has the same timestamp which is the mtime of the file. This makes it inefficient to scan all the messages within a time window, which is a valid use case in a lot of realtime data analysis.
My guess this is not implemented due to the efficiency reason. It will cost additional four bytes per message which might be pinned in memory for fast access. There might be some simple perf optimization, such as differential encoding + var length encoding, which should bring down the cost to 1-2 bytes avg per message.
Let me know if this makes sense.
--
This message was sent by Atlassian JIRA
(v6.2#6252)