You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Boris Chmiel <bo...@yahoo.com.INVALID> on 2015/11/17 17:43:29 UTC

Text Vectorisation

hello users,
I'm trying to vectorize a text field of a CSV file and planning to use the FLATTEN function to make an analysis of used words. I tryed a combination of regex / Convert_From to transform the text into a parquet array without success.Does anyone already achived that ?  
thxBoris

Re: Text Vectorisation

Posted by Steven Phillips <st...@dremio.com>.
Could you elaborate a bit on what it is you are trying to do, as well as
what you have tried and what result you saw?

Thanks.

On Tue, Nov 17, 2015 at 8:43 AM, Boris Chmiel <
boris.chmiel@yahoo.com.invalid> wrote:

> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris

Re: Text Vectorisation

Posted by Boris Chmiel <bo...@yahoo.com.INVALID>.
Hi
The idea was to do a simple text classification. I manage to get a quick & dirty query that fulfill my needs :

text_file.csv :
A ;Get faster insights without the overheadB ;Leverage your existing SQL skillsets
Query
SELECT columns[0] id, FLATTEN(CONVERT_FROM('["' || REGEXP_REPLACE(columns[1],' ','","') || '"]','JSON')) text FROM dfs.tmp.`text_file.csv`
Result :
+-----+------------+| id  |     text      |+-----+------------+| A   | Get        || A   | faster     || A   | insights   || A   | without    || A   | the        || A   | overhead   || B   | Leverage   || B   | your       || B   | existing   || B   | SQL        || B   | skillsets  |+-----+------------+
Boris 


    Le Jeudi 19 novembre 2015 4h37, Ted Dunning <te...@gmail.com> a écrit :
 

 Boris,

I think I know what you mean by text vectorization, but I am uncertain how
you are trying to do it with SQL.

Can you give a sample query?

What result did  you expect?

On Wed, Nov 18, 2015 at 12:43 AM, Boris Chmiel <
boris.chmiel@yahoo.com.invalid> wrote:

> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris


  

Re: Text Vectorisation

Posted by Ted Dunning <te...@gmail.com>.
Boris,

I think I know what you mean by text vectorization, but I am uncertain how
you are trying to do it with SQL.

Can you give a sample query?

What result did  you expect?

On Wed, Nov 18, 2015 at 12:43 AM, Boris Chmiel <
boris.chmiel@yahoo.com.invalid> wrote:

> hello users,
> I'm trying to vectorize a text field of a CSV file and planning to use the
> FLATTEN function to make an analysis of used words. I tryed a combination
> of regex / Convert_From to transform the text into a parquet array without
> success.Does anyone already achived that ?
> thxBoris