You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Marcus Vinicius <ma...@gmail.com> on 2017/01/19 13:54:56 UTC

text classification in portuguese

Hello guys,

I`m again. I`m trying to classify a portuguese text following the demo
tutorial (http://predictionio.incubator.apache.org/demo/textclassification/
).

Someone already perform this with predictionIo? How could be the better way
to i lead with stemming and stop portuguese words?

Allow me to take this opportunity to do another question. Someone has
problem with encoding? My csv load file is in ISO-8859 and in python script
i`m transforming my text to utf-8.

text_utf8 = text.decode('iso-8859-1').encode('utf-8')
    client.create_event(
      event="documents",
      entity_type="source",
      entity_id=str(count), # use the count num as user ID
      properties= {
        "text" : text_utf8,
        "category" : attr[2],
        "label" : int(attr[3])
      }
    )

When i retrive event from http://localhost:7070/events.json i got  a
encoded word. Is it right?

{"eventId":"x","event":"documents","entityType":"source","entityId":"73","properties":{"category":"A","text":"Gest\u008bo
de Caixa","label":2},"eventTime":"2017-01-19T12:31:27.863Z","creationTime":"2017-01-19T12:31:27.867Z"}


I really appreciate your attention.


-- 

Marcus Vinicius A. Silva

*P*  *ANTES DE IMPRIMIR pense em sua responsabilidade e compromisso
com o MEIO AMBIENTE.*

Fwd: text classification in portuguese

Posted by Suneel Marthi <sm...@apache.org>.
Fyi folks

Attn: @Wcolen


---------- Forwarded message ----------
From: Gustavo Frederico <gu...@thinkwrap.com>
Date: Thu, Jan 19, 2017 at 9:59 AM
Subject: Re: text classification in portuguese
To: user@predictionio.incubator.apache.org


Marcus, at first sight this looks like a correct Json encoding. Json itself
encodes the UTF-8 characters.

Abraço
Gustavo

On Thu, Jan 19, 2017 at 8:54 AM, Marcus Vinicius <ma...@gmail.com>
wrote:

> Hello guys,
>
> I`m again. I`m trying to classify a portuguese text following the demo
> tutorial (http://predictionio.incubator.apache.org/demo/textclassific
> ation/).
>
> Someone already perform this with predictionIo? How could be the better
> way to i lead with stemming and stop portuguese words?
>
> Allow me to take this opportunity to do another question. Someone has
> problem with encoding? My csv load file is in ISO-8859 and in python script
> i`m transforming my text to utf-8.
>
> text_utf8 = text.decode('iso-8859-1').encode('utf-8')
>     client.create_event(
>       event="documents",
>       entity_type="source",
>       entity_id=str(count), # use the count num as user ID
>       properties= {
>         "text" : text_utf8,
>         "category" : attr[2],
>         "label" : int(attr[3])
>       }
>     )
>
> When i retrive event from http://localhost:7070/events.json i got  a
> encoded word. Is it right?
>
> {"eventId":"x","event":"documents","entityType":"source","entityId":"73","properties":{"category":"A","text":"Gest\u008bo de Caixa","label":2},"eventTime":"2017-01-19T12:31:27.863Z","creationTime":"2017-01-19T12:31:27.867Z"}
>
>
> I really appreciate your attention.
>
>
> --
>
> Marcus Vinicius A. Silva
>
> *P*  *ANTES DE IMPRIMIR pense em sua responsabilidade e compromisso
> com o MEIO AMBIENTE.*
>

Re: text classification in portuguese

Posted by Gustavo Frederico <gu...@thinkwrap.com>.
Marcus, at first sight this looks like a correct Json encoding. Json itself
encodes the UTF-8 characters.

Abraço
Gustavo

On Thu, Jan 19, 2017 at 8:54 AM, Marcus Vinicius <ma...@gmail.com>
wrote:

> Hello guys,
>
> I`m again. I`m trying to classify a portuguese text following the demo
> tutorial (http://predictionio.incubator.apache.org/demo/
> textclassification/).
>
> Someone already perform this with predictionIo? How could be the better
> way to i lead with stemming and stop portuguese words?
>
> Allow me to take this opportunity to do another question. Someone has
> problem with encoding? My csv load file is in ISO-8859 and in python script
> i`m transforming my text to utf-8.
>
> text_utf8 = text.decode('iso-8859-1').encode('utf-8')
>     client.create_event(
>       event="documents",
>       entity_type="source",
>       entity_id=str(count), # use the count num as user ID
>       properties= {
>         "text" : text_utf8,
>         "category" : attr[2],
>         "label" : int(attr[3])
>       }
>     )
>
> When i retrive event from http://localhost:7070/events.json i got  a
> encoded word. Is it right?
>
> {"eventId":"x","event":"documents","entityType":"source","entityId":"73","properties":{"category":"A","text":"Gest\u008bo de Caixa","label":2},"eventTime":"2017-01-19T12:31:27.863Z","creationTime":"2017-01-19T12:31:27.867Z"}
>
>
> I really appreciate your attention.
>
>
> --
>
> Marcus Vinicius A. Silva
>
> *P*  *ANTES DE IMPRIMIR pense em sua responsabilidade e compromisso
> com o MEIO AMBIENTE.*
>