You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Boris Chmiel <bo...@yahoo.com.INVALID> on 2015/11/17 17:43:29 UTC
Text Vectorisation
hello users,
I'm trying to vectorize a text field of a CSV file and planning to use the FLATTEN function to make an analysis of used words. I tryed a combination of regex / Convert_From to transform the text into a parquet array without success.Does anyone already achived that ?
thxBoris
Re: Text Vectorisation
Posted by Steven Phillips <st...@dremio.com>.
Could you elaborate a bit on what it is you are trying to do, as well as
what you have tried and what result you saw?
Thanks.
On Tue, Nov 17, 2015 at 8:43 AM, Boris Chmiel <
boris.chmiel@yahoo.com.invalid> wrote:
> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris
Re: Text Vectorisation
Posted by Boris Chmiel <bo...@yahoo.com.INVALID>.
Hi
The idea was to do a simple text classification. I manage to get a quick & dirty query that fulfill my needs :
text_file.csv :
A ;Get faster insights without the overheadB ;Leverage your existing SQL skillsets
Query
SELECT columns[0] id, FLATTEN(CONVERT_FROM('["' || REGEXP_REPLACE(columns[1],' ','","') || '"]','JSON')) text FROM dfs.tmp.`text_file.csv`
Result :
+-----+------------+| id | text |+-----+------------+| A | Get || A | faster || A | insights || A | without || A | the || A | overhead || B | Leverage || B | your || B | existing || B | SQL || B | skillsets |+-----+------------+
Boris
Le Jeudi 19 novembre 2015 4h37, Ted Dunning <te...@gmail.com> a écrit :
Boris,
I think I know what you mean by text vectorization, but I am uncertain how
you are trying to do it with SQL.
Can you give a sample query?
What result did you expect?
On Wed, Nov 18, 2015 at 12:43 AM, Boris Chmiel <
boris.chmiel@yahoo.com.invalid> wrote:
> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris
Re: Text Vectorisation
Posted by Ted Dunning <te...@gmail.com>.
Boris,
I think I know what you mean by text vectorization, but I am uncertain how
you are trying to do it with SQL.
Can you give a sample query?
What result did you expect?
On Wed, Nov 18, 2015 at 12:43 AM, Boris Chmiel <
boris.chmiel@yahoo.com.invalid> wrote:
> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris