You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2020/04/28 17:53:03 UTC

[GitHub] [incubator-superset] jzonthemtn opened a new issue #9672: Tokenize text when making a word cloud

jzonthemtn opened a new issue #9672:
URL: https://github.com/apache/incubator-superset/issues/9672


   **Is your feature request related to a problem? Please describe.**
   When using the word cloud on fields that contain sentences, the word cloud treats each field as a "word" in the word cloud. The word cloud then contains each sentence.
   
   **Describe the solution you'd like**
   Have an option to tokenize (or simply split on whitespace) the words in a field.
   
   **Describe alternatives you've considered**
   Changing how data gets ingested into the database but I don't see a good solution from that angle if the word cloud is expecting one word per field.
   
   **Additional context**
   None.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] villebro commented on issue #9672: Tokenize text when making a word cloud

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #9672:
URL: https://github.com/apache/incubator-superset/issues/9672#issuecomment-620777008


   This is an interesting idea @jzonthemtn . To make sure the proposed feature is as generic as possible, do you have any suggestions for tokenization options? I'm thinking how to handle periods, commas, special characters etc?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] jzonthemtn commented on issue #9672: Tokenize text when making a word cloud

Posted by GitBox <gi...@apache.org>.
jzonthemtn commented on issue #9672:
URL: https://github.com/apache/incubator-superset/issues/9672#issuecomment-650776192


   @villebro My recommendation would be to strip punctuation and then split on whitespace. This would work well for my use-case. If that is not sufficient for a user then I would suggest they do any necessary preprocessing of the text prior to saving it in their database so that way they have control over how they want to handle the tokenizing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] stale[bot] commented on issue #9672: Tokenize text when making a word cloud

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on issue #9672:
URL: https://github.com/apache/incubator-superset/issues/9672#issuecomment-683258389


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue `.pinned` to prevent stale bot from closing the issue.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] ktmud commented on issue #9672: Tokenize text when making a word cloud

Posted by GitBox <gi...@apache.org>.
ktmud commented on issue #9672:
URL: https://github.com/apache/incubator-superset/issues/9672#issuecomment-650819927


   While this feature would definitely be useful, it’s also pretty easy to create virtual data sources that split strings and explode arrays into rows:
   
   https://stackoverflow.com/questions/51063730/split-one-row-into-multiple-rows-based-on-comma-separated-string-column
   
   https://stackoverflow.com/questions/17942508/sql-split-values-to-multiple-rows
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] stale[bot] commented on issue #9672: Tokenize text when making a word cloud

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on issue #9672:
URL: https://github.com/apache/incubator-superset/issues/9672#issuecomment-650657087


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue `.pinned` to prevent stale bot from closing the issue.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] stale[bot] closed issue #9672: Tokenize text when making a word cloud

Posted by GitBox <gi...@apache.org>.
stale[bot] closed issue #9672:
URL: https://github.com/apache/incubator-superset/issues/9672


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org