You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Torsten Bergh Moss <to...@ig.ntnu.no> on 2019/11/16 14:59:04 UTC

Questionable UDF behaviour

Hi it's me again,


I know it's been mentioned to me before that the examples in ?https://github.com/idleft/asterix-udf-template/ are for template-purposes and not meant to actually be used, however, I hope somebody can shed light upon this behaviour that I can't wrap my head around.


I am running the sample sentiment function, https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/SentimentFunction.java.


The only change I've made from the original code has been changing the casting from JString to JLong on line 40 as the ID from tweets saved in AsterixDB are numbers (and will provoke an error if treated as JStrings).


[cid:cac38053-f399-4eaf-9c40-f745d4bc2ef5]


When running the UDF on my dataset of tweets it only works as expected on the first tweet?, for the rest of them it also changes the value of the text-field (which should equal the text in the original tweet), like this:


[cid:127092ca-c89e-4ae1-8696-f752cc2d991c]

The SQL++ command I am using to produce these results is


SELECT testlib#getSentiment(t) AS ProcessedTweet FROM (SELECT id, text FROM Tweets) AS t;


Looking at the code, the UDF gets the text-value on line 41 and populates the text-field on the result-object on line 46. Can somebody give me some hints about why the text-field is set to the same value as the sentiment-field?


Best wishes,

Torsten

Re: Questionable UDF behaviour

Posted by Torsten Bergh Moss <to...@ig.ntnu.no>.
Thanks, that solved it.

Best wishes,
Torsten
________________________________________
From: Xikui Wang <xi...@uci.edu>
Sent: Sunday, November 17, 2019 1:20 AM
To: dev@asterixdb.apache.org
Subject: Re: Questionable UDF behaviour

Hi Torsten,

Sorry about the confusion. The issue problem that you see if because of a
minor bug in the template. At line 60 of the sentiment function, we should
create a new JString object instead of getting a JString from the function
helper. This would cause this variable be reclaimed by the function helper
for parameter setting later and mess up the data. To fix this, replace line
60 with the following code should resolve your issue. The template repo is
updated as well.

        jString = new JString("");

Best,
Xikui

On Sat, Nov 16, 2019 at 7:05 AM Torsten Bergh Moss <
torsten.b.moss@ig.ntnu.no> wrote:

> I guess my attempt to inline screenshots of code and results in order to
> not have to worry about text-formatting ?failed miserably.
>
>
> Code:
>
> @Override
> public void evaluate(IFunctionHelper functionHelper) throws Exception {
> // Read input record
> JRecord inputRecord = (JRecord) functionHelper.getArgument(0);
>
> JLong id = (JLong) inputRecord.getValueByName("id");
> JString text = (JString) inputRecord.getValueByName("text");
>
> // Populate result record
> JRecord result = (JRecord) functionHelper.getResultObject();
> result.setField("id", id);
> result.setField("text", text);
>
> if (text.getValue().length() > 66) {
> this.jString.setValue("Amazing!");
> } else {
> this.jString.setValue("Boring!");
> }
> result.setField("Sentiment", jString);
> functionHelper.setResult(result);
> }
>
>
> Results:
>
> { "ProcessedTweet": { "id": 1170705127629611008, "text": "la verdad q si",
> "Sentiment": "Boring!" } }
> { "ProcessedTweet": { "id": 1170705134428532736, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705158998593541, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705204574085121, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705245414051842, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705264921776129, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705288711852033, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705318881505280, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705358068887558, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705359985684481, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705373050941440, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705421151154177, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705470966894592, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705480815140865, "text": "Amazing!",
> "Sentiment": "Amazing!" } }?
>
>
> Best wishes,
>
> Torsten
>
> ________________________________
> From: Torsten Bergh Moss <to...@ig.ntnu.no>
> Sent: Saturday, November 16, 2019 3:59 PM
> To: dev@asterixdb.apache.org
> Subject: Questionable UDF behaviour
>
>
> Hi it's me again,
>
>
> I know it's been mentioned to me before that the examples in ?
> https://github.com/idleft/asterix-udf-template/ are for template-purposes
> and not meant to actually be used, however, I hope somebody can shed light
> upon this behaviour that I can't wrap my head around.
>
>
> I am running the sample sentiment function,
> https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/SentimentFunction.java
> .
>
>
> The only change I've made from the original code has been changing the
> casting from JString to JLong on line 40 as the ID from tweets saved in
> AsterixDB are numbers (and will provoke an error if treated as JStrings).
>
>
> [cid:cac38053-f399-4eaf-9c40-f745d4bc2ef5]
>
>
> When running the UDF on my dataset of tweets it only works as expected on
> the first tweet?, for the rest of them it also changes the value of the
> text-field (which should equal the text in the original tweet), like this:
>
>
> [cid:127092ca-c89e-4ae1-8696-f752cc2d991c]
>
> The SQL++ command I am using to produce these results is
>
>
> SELECT testlib#getSentiment(t) AS ProcessedTweet FROM (SELECT id, text
> FROM Tweets) AS t;
>
>
> Looking at the code, the UDF gets the text-value on line 41 and populates
> the text-field on the result-object on line 46. Can somebody give me some
> hints about why the text-field is set to the same value as the
> sentiment-field?
>
>
> Best wishes,
>
> Torsten
>

Re: Questionable UDF behaviour

Posted by Xikui Wang <xi...@uci.edu>.
Hi Torsten,

Sorry about the confusion. The issue problem that you see if because of a
minor bug in the template. At line 60 of the sentiment function, we should
create a new JString object instead of getting a JString from the function
helper. This would cause this variable be reclaimed by the function helper
for parameter setting later and mess up the data. To fix this, replace line
60 with the following code should resolve your issue. The template repo is
updated as well.

        jString = new JString("");

Best,
Xikui

On Sat, Nov 16, 2019 at 7:05 AM Torsten Bergh Moss <
torsten.b.moss@ig.ntnu.no> wrote:

> I guess my attempt to inline screenshots of code and results in order to
> not have to worry about text-formatting ?failed miserably.
>
>
> Code:
>
> @Override
> public void evaluate(IFunctionHelper functionHelper) throws Exception {
> // Read input record
> JRecord inputRecord = (JRecord) functionHelper.getArgument(0);
>
> JLong id = (JLong) inputRecord.getValueByName("id");
> JString text = (JString) inputRecord.getValueByName("text");
>
> // Populate result record
> JRecord result = (JRecord) functionHelper.getResultObject();
> result.setField("id", id);
> result.setField("text", text);
>
> if (text.getValue().length() > 66) {
> this.jString.setValue("Amazing!");
> } else {
> this.jString.setValue("Boring!");
> }
> result.setField("Sentiment", jString);
> functionHelper.setResult(result);
> }
>
>
> Results:
>
> { "ProcessedTweet": { "id": 1170705127629611008, "text": "la verdad q si",
> "Sentiment": "Boring!" } }
> { "ProcessedTweet": { "id": 1170705134428532736, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705158998593541, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705204574085121, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705245414051842, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705264921776129, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705288711852033, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705318881505280, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705358068887558, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705359985684481, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705373050941440, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705421151154177, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705470966894592, "text": "Amazing!",
> "Sentiment": "Amazing!" } }
> { "ProcessedTweet": { "id": 1170705480815140865, "text": "Amazing!",
> "Sentiment": "Amazing!" } }?
>
>
> Best wishes,
>
> Torsten
>
> ________________________________
> From: Torsten Bergh Moss <to...@ig.ntnu.no>
> Sent: Saturday, November 16, 2019 3:59 PM
> To: dev@asterixdb.apache.org
> Subject: Questionable UDF behaviour
>
>
> Hi it's me again,
>
>
> I know it's been mentioned to me before that the examples in ?
> https://github.com/idleft/asterix-udf-template/ are for template-purposes
> and not meant to actually be used, however, I hope somebody can shed light
> upon this behaviour that I can't wrap my head around.
>
>
> I am running the sample sentiment function,
> https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/SentimentFunction.java
> .
>
>
> The only change I've made from the original code has been changing the
> casting from JString to JLong on line 40 as the ID from tweets saved in
> AsterixDB are numbers (and will provoke an error if treated as JStrings).
>
>
> [cid:cac38053-f399-4eaf-9c40-f745d4bc2ef5]
>
>
> When running the UDF on my dataset of tweets it only works as expected on
> the first tweet?, for the rest of them it also changes the value of the
> text-field (which should equal the text in the original tweet), like this:
>
>
> [cid:127092ca-c89e-4ae1-8696-f752cc2d991c]
>
> The SQL++ command I am using to produce these results is
>
>
> SELECT testlib#getSentiment(t) AS ProcessedTweet FROM (SELECT id, text
> FROM Tweets) AS t;
>
>
> Looking at the code, the UDF gets the text-value on line 41 and populates
> the text-field on the result-object on line 46. Can somebody give me some
> hints about why the text-field is set to the same value as the
> sentiment-field?
>
>
> Best wishes,
>
> Torsten
>

Re: Questionable UDF behaviour

Posted by Torsten Bergh Moss <to...@ig.ntnu.no>.
I guess my attempt to inline screenshots of code and results in order to not have to worry about text-formatting ?failed miserably.


Code:

@Override
public void evaluate(IFunctionHelper functionHelper) throws Exception {
// Read input record
JRecord inputRecord = (JRecord) functionHelper.getArgument(0);

JLong id = (JLong) inputRecord.getValueByName("id");
JString text = (JString) inputRecord.getValueByName("text");

// Populate result record
JRecord result = (JRecord) functionHelper.getResultObject();
result.setField("id", id);
result.setField("text", text);

if (text.getValue().length() > 66) {
this.jString.setValue("Amazing!");
} else {
this.jString.setValue("Boring!");
}
result.setField("Sentiment", jString);
functionHelper.setResult(result);
}


Results:

{ "ProcessedTweet": { "id": 1170705127629611008, "text": "la verdad q si", "Sentiment": "Boring!" } }
{ "ProcessedTweet": { "id": 1170705134428532736, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705158998593541, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705204574085121, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705245414051842, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705264921776129, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705288711852033, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705318881505280, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705358068887558, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705359985684481, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705373050941440, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705421151154177, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705470966894592, "text": "Amazing!", "Sentiment": "Amazing!" } }
{ "ProcessedTweet": { "id": 1170705480815140865, "text": "Amazing!", "Sentiment": "Amazing!" } }?


Best wishes,

Torsten

________________________________
From: Torsten Bergh Moss <to...@ig.ntnu.no>
Sent: Saturday, November 16, 2019 3:59 PM
To: dev@asterixdb.apache.org
Subject: Questionable UDF behaviour


Hi it's me again,


I know it's been mentioned to me before that the examples in ?https://github.com/idleft/asterix-udf-template/ are for template-purposes and not meant to actually be used, however, I hope somebody can shed light upon this behaviour that I can't wrap my head around.


I am running the sample sentiment function, https://github.com/idleft/asterix-udf-template/blob/master/src/main/java/org/apache/asterix/external/library/SentimentFunction.java.


The only change I've made from the original code has been changing the casting from JString to JLong on line 40 as the ID from tweets saved in AsterixDB are numbers (and will provoke an error if treated as JStrings).


[cid:cac38053-f399-4eaf-9c40-f745d4bc2ef5]


When running the UDF on my dataset of tweets it only works as expected on the first tweet?, for the rest of them it also changes the value of the text-field (which should equal the text in the original tweet), like this:


[cid:127092ca-c89e-4ae1-8696-f752cc2d991c]

The SQL++ command I am using to produce these results is


SELECT testlib#getSentiment(t) AS ProcessedTweet FROM (SELECT id, text FROM Tweets) AS t;


Looking at the code, the UDF gets the text-value on line 41 and populates the text-field on the result-object on line 46. Can somebody give me some hints about why the text-field is set to the same value as the sentiment-field?


Best wishes,

Torsten