You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/12/04 17:54:52 UTC

[GitHub] [incubator-pinot] pabrahamusa opened a new issue #6317: Allow No Raw Data Index with S3 data store option for TEXT Index

pabrahamusa opened a new issue #6317:
URL: https://github.com/apache/incubator-pinot/issues/6317


   I would like to suggest a feature for TEXT_MATCH index
   
   At the moment if we have "noRawDataForTextIndex": "true" , this will allow to store just index without data. Which is a great feature where we can reduce the size of the index. However the issue is that we wont be able to fetch the actual data itself but just do search on it. What I would like to have is something where we store the index without raw data also provide an option to store the actual data in a different store like S3. So when a search happens against the index pull the corresponding data from S3 and provide the results.  This allow us to have a cheaper datastore but efficient search.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] fx19880617 commented on issue #6317: Allow No Raw Data Index with S3 data store option for TEXT Index

Posted by GitBox <gi...@apache.org>.

fx19880617 commented on issue #6317:
URL: https://github.com/apache/incubator-pinot/issues/6317#issuecomment-742293541


   Pinot doesn't support serving queries directly from the deep store. In order to answer queries, all the data has to be local to disk.
   We are adding lazy data-loading support in https://github.com/apache/incubator-pinot/pull/6250
   I feel this could be the next step once the above PR is merged. So we can separate the data part and only download partial to local disk for serving normal queries and pull the other parts on deep store for on demand queries.
   
   @siddharthteotia what do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] pabrahamusa commented on issue #6317: Allow No Raw Data Index with S3 data store option for TEXT Index

Posted by GitBox <gi...@apache.org>.

pabrahamusa commented on issue #6317:
URL: https://github.com/apache/incubator-pinot/issues/6317#issuecomment-743384566


   @fx19880617   Tiered Storage plus Lazy loading will surely help. However if a raw index is considerably small we can preferably store it in SSDs and separate the data out to object storage. The only caveat here is how to map the index to data locations in object store.  There should be some way to mount the object store locally to pull the corresponding data. Either by defining partitions based on the index which helps to locate the object directly. Also there is a possibility of introducing zstd to further reduce the size and increase the speed of transfer.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] pabrahamusa edited a comment on issue #6317: Allow No Raw Data Index with S3 data store option for TEXT Index

Posted by GitBox <gi...@apache.org>.

pabrahamusa edited a comment on issue #6317:
URL: https://github.com/apache/incubator-pinot/issues/6317#issuecomment-743384566


   @fx19880617   Tiered Storage plus Lazy loading will surely help. However if a no raw data index is considerably small we can preferably store it in SSDs and separate the data out to object storage. The only caveat here is how to map the index to data locations in object store.  There should be some way to mount the object store locally to pull the corresponding data. Either by defining partitions based on the index which helps to locate the object directly. Also there is a possibility of introducing zstd to further reduce the size and increase the speed of transfer.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org