You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mrql.apache.org by Eldon Carman <ec...@ucr.edu> on 2014/07/07 23:43:54 UTC

Query Error

Hi,

I have a two MRQL queries on XML that are giving me error messages. I have
included a working MRQL query that I have used as a base to build the next
two queries. Below is the example query and a basic outline of the XML
format.

select (r)
from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
    stations in source(xml, "sample_xml/stations.xml", {"station"}),
    l in stations.locationLabels
where text(stations.id) = text(sensors.station)
    and text(sensors.date) = "1976-07-04T00:00:00.000"
    and text(l.displayName) = "WASHINGTON";

<data>
  <date></date>
  <dataType></dataType>
  <station></station>
  <value></value>
</data>

<station>
  <id></id>
  <locationLabels>
    <id></id>
    <displayName></displayName>
  </locationLabels>
  <!-- multiple locationLables -->
</station>



1) The following query only modifies the where parameters and adds the
aggregate function, yet I am getting a compile error.

min(
    select (toInt(text(sensors.value)))
    from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
        stations in source(xml, "sample_xml/stations.xml", {"station"}),
        l in stations.locationLabels
    where text(stations.id) = text(sensors.station)
        and toInt(substring(text(sensors.date), 0, 4)) = 2001
        and text(sensors.dataType) = "TMIN"
        and text(l.id) = "FIPS:US"
) / 10;

Here is the error I am getting. Any suggestions?

java.lang.ClassCastException: org.apache.mrql.MR_int cannot be cast to
org.apache.mrql.Bag
        at org.apache.mrql.MRQL_Lambda_5.eval(UserFunctions_0.java from
JavaSourceFromString:27)
        at
org.apache.mrql.MapReduceAlgebra$1.hasNext(MapReduceAlgebra.java:59)
        at
org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:292)
        at
org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:184)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)



2) Also I have one more query that again just a few changes to the where
clause and the return statement is a bag of results.

select (n, d, v)
from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
    d in sensors.date,
    v in sensors.value,
    stations in source(xml, "sample_xml/stations.xml", {"station"}),
    n in stations.displayName,
    l in stations.locationLabels
where text(stations.id) = text(sensors.station)
    and toInt(substring(text(d), 0, 4)) = 2000
    and text(sensors.dataType) = "TMAX"
    and text(l.displayName) = "WASHINGTON";

Produces the following error message.

Apache MRQL version 0.9.2 (compiled distributed Hadoop MapReduce mode with
5 reducers)
Query type: !bag(( XML, XML, XML ))
*** MRQL error at line 12: wrong projection: project(x_27,value)
(Node(tuple(string,bag(tuple(string,string)),list(XML))))

Thanks for your feedback.

Re: Query Error

Posted by Eldon Carman <ec...@ucr.edu>.
Thanks for the quick response. The issue has been resolved.


On Tue, Jul 29, 2014 at 4:16 PM, Leonidas Fegaras <fe...@cse.uta.edu>
wrote:

>  I think I fixed the bug. It was in the hadoop map-reduce join. It should
> have cached the result of the reduce function in memory. The attached patch
> fixes the bug. Let me know if this works for you.
> Leonidas
>
>
> On 07/29/2014 10:42 AM, Leonidas Fegaras wrote:
>
> Hi Preston,
> I think the best way to resolve this issue is to upload your data and your
> queries on some site (such as dropbox) and give me permission to download
> them so I can recreate your bug. I have tried XML queries on DBLP data
> which is few 100MBs (see queries/dblp-pagerank.mrql). Even if XML elements
> cross block boundaries, the XML input format should handle them correctly.
> The XML parser though is very simple. One possibility is that there was an
> XML element that was not properly closed or it may contained unusual chars.
> Leonidas
>
>
> On 07/28/2014 11:54 PM, Eldon Carman wrote:
>
>  So I finally had some time to look at MRQL again. :-)
>
>  The patch did not fix my issue. The error does not occur on small XML
> (less than 64MB). The error occurs only when the file is large enough to be
> split up into multiple chunks on HDFS. Does this help identify the issue.
> It seems like its related to merging data together after parsing during the
> map operation.
>
> Thanks,
> Preston
>
> On Tue, Jul 8, 2014 at 7:52 AM, Leonidas Fegaras <fe...@cse.uta.edu>
> wrote:
>
>>  Hi Eldon,
>> I couldn't recreate exactly your errors but I found one genuine bug in
>> the code generation for XML projections, which is solved with the attached
>> patch.
>> I used the following data and queries:
>>
>> sample_xml/sensors.xml:
>> <data>
>>   <date>2001/2/2</date>
>>   <dataType>TMIN</dataType>
>>   <station>xyz</station>
>>   <value>35</value>
>> </data>
>>
>> sample_xml/stations.xml:
>> <station>
>>   <id>xyz</id>
>>   <locationLabels>
>>     <id>FIPS:US</id>
>>     <displayName>WASHINGTON</displayName>
>>   </locationLabels>
>> </station>
>>
>> Q1:
>>
>> min(
>>     select (toInt(text(sensors.value)))
>>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>         l in stations.locationLabels
>>     where text(stations.id) = text(sensors.station)
>>         and toInt(substring(text(sensors.date), 0, 4)) = 2001
>>         and text(sensors.dataType) = "TMIN"
>>         and text(l.id) = "FIPS:US"
>> ) / 10;
>>
>>  Q2:
>>
>> select (n, d, v)
>> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>     d in sensors.date,
>>     v in sensors.value,
>>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>      l in stations.locationLabels,
>>     n in l.displayName
>> where text(stations.id) = text(sensors.station)
>>     and toInt(substring(text(d), 0, 4)) = 2001
>>     and text(sensors.dataType) = "TMIN"
>>     and text(l.displayName) = "WASHINGTON";
>>
>> When I run Q1, it gives 3.
>> Q2 gives {
>> (<displayName>WASHINGTON</displayName>,<date>2001/2/2</date>,<value>35</value>)
>> }
>>
>> Leonidas
>>
>>
>>
>> On 07/07/2014 04:43 PM, Eldon Carman wrote:
>>
>> Hi,
>>
>> I have a two MRQL queries on XML that are giving me error messages. I
>> have included a working MRQL query that I have used as a base to build the
>> next two queries. Below is the example query and a basic outline of the XML
>> format.
>>
>> select (r)
>> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>     l in stations.locationLabels
>> where text(stations.id) = text(sensors.station)
>>     and text(sensors.date) = "1976-07-04T00:00:00.000"
>>     and text(l.displayName) = "WASHINGTON";
>>
>> <data>
>>   <date></date>
>>   <dataType></dataType>
>>   <station></station>
>>   <value></value>
>> </data>
>>
>> <station>
>>   <id></id>
>>   <locationLabels>
>>     <id></id>
>>     <displayName></displayName>
>>   </locationLabels>
>>   <!-- multiple locationLables -->
>> </station>
>>
>>
>>
>> 1) The following query only modifies the where parameters and adds the
>> aggregate function, yet I am getting a compile error.
>>
>> min(
>>     select (toInt(text(sensors.value)))
>>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>         l in stations.locationLabels
>>     where text(stations.id) = text(sensors.station)
>>         and toInt(substring(text(sensors.date), 0, 4)) = 2001
>>         and text(sensors.dataType) = "TMIN"
>>         and text(l.id) = "FIPS:US"
>> ) / 10;
>>
>> Here is the error I am getting. Any suggestions?
>>
>> java.lang.ClassCastException: org.apache.mrql.MR_int cannot be cast to
>> org.apache.mrql.Bag
>>         at org.apache.mrql.MRQL_Lambda_5.eval(UserFunctions_0.java from
>> JavaSourceFromString:27)
>>         at
>> org.apache.mrql.MapReduceAlgebra$1.hasNext(MapReduceAlgebra.java:59)
>>         at
>> org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:292)
>>         at
>> org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:184)
>>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>>         at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>
>>
>>
>> 2) Also I have one more query that again just a few changes to the where
>> clause and the return statement is a bag of results.
>>
>> select (n, d, v)
>> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>     d in sensors.date,
>>     v in sensors.value,
>>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>     n in stations.displayName,
>>     l in stations.locationLabels
>> where text(stations.id) = text(sensors.station)
>>     and toInt(substring(text(d), 0, 4)) = 2000
>>     and text(sensors.dataType) = "TMAX"
>>     and text(l.displayName) = "WASHINGTON";
>>
>> Produces the following error message.
>>
>> Apache MRQL version 0.9.2 (compiled distributed Hadoop MapReduce mode
>> with 5 reducers)
>> Query type: !bag(( XML, XML, XML ))
>> *** MRQL error at line 12: wrong projection: project(x_27,value)
>> (Node(tuple(string,bag(tuple(string,string)),list(XML))))
>>
>>  Thanks for your feedback.
>>
>>
>>
>
>
>

Re: Query Error

Posted by Leonidas Fegaras <fe...@cse.uta.edu>.
I think I fixed the bug. It was in the hadoop map-reduce join. It should 
have cached the result of the reduce function in memory. The attached 
patch fixes the bug. Let me know if this works for you.
Leonidas

On 07/29/2014 10:42 AM, Leonidas Fegaras wrote:
> Hi Preston,
> I think the best way to resolve this issue is to upload your data and 
> your queries on some site (such as dropbox) and give me permission to 
> download them so I can recreate your bug. I have tried XML queries on 
> DBLP data which is few 100MBs (see queries/dblp-pagerank.mrql). Even 
> if XML elements cross block boundaries, the XML input format should 
> handle them correctly. The XML parser though is very simple. One 
> possibility is that there was an XML element that was not properly 
> closed or it may contained unusual chars.
> Leonidas
>
>
> On 07/28/2014 11:54 PM, Eldon Carman wrote:
>> So I finally had some time to look at MRQL again. :-)
>>
>> The patch did not fix my issue. The error does not occur on small XML 
>> (less than 64MB). The error occurs only when the file is large enough 
>> to be split up into multiple chunks on HDFS. Does this help identify 
>> the issue. It seems like its related to merging data together after 
>> parsing during the map operation.
>>
>> Thanks,
>> Preston
>>
>> On Tue, Jul 8, 2014 at 7:52 AM, Leonidas Fegaras <fegaras@cse.uta.edu 
>> <ma...@cse.uta.edu>> wrote:
>>
>>     Hi Eldon,
>>     I couldn't recreate exactly your errors but I found one genuine
>>     bug in the code generation for XML projections, which is solved
>>     with the attached patch.
>>     I used the following data and queries:
>>
>>     sample_xml/sensors.xml:
>>     <data>
>>       <date>2001/2/2</date>
>>       <dataType>TMIN</dataType>
>>       <station>xyz</station>
>>       <value>35</value>
>>     </data>
>>
>>     sample_xml/stations.xml:
>>     <station>
>>       <id>xyz</id>
>>       <locationLabels>
>>         <id>FIPS:US</id>
>>     <displayName>WASHINGTON</displayName>
>>       </locationLabels>
>>     </station>
>>
>>     Q1:
>>
>>     min(
>>         select (toInt(text(sensors.value)))
>>         from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>             stations in source(xml, "sample_xml/stations.xml",
>>     {"station"}),
>>             l in stations.locationLabels
>>         where text(stations.id <http://stations.id>) =
>>     text(sensors.station)
>>             and toInt(substring(text(sensors.date), 0, 4)) = 2001
>>             and text(sensors.dataType) = "TMIN"
>>             and text(l.id <http://l.id>) = "FIPS:US"
>>     ) / 10;
>>
>>     Q2:
>>
>>     select (n, d, v)
>>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>         d in sensors.date,
>>         v in sensors.value,
>>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>         l in stations.locationLabels,
>>         n in l.displayName
>>     where text(stations.id <http://stations.id>) = text(sensors.station)
>>         and toInt(substring(text(d), 0, 4)) = 2001
>>         and text(sensors.dataType) = "TMIN"
>>         and text(l.displayName) = "WASHINGTON";
>>
>>     When I run Q1, it gives 3.
>>     Q2 gives {
>>     (<displayName>WASHINGTON</displayName>,<date>2001/2/2</date>,<value>35</value>)
>>     }
>>
>>     Leonidas
>>
>>
>>
>>     On 07/07/2014 04:43 PM, Eldon Carman wrote:
>>>     Hi,
>>>
>>>     I have a two MRQL queries on XML that are giving me error
>>>     messages. I have included a working MRQL query that I have used
>>>     as a base to build the next two queries. Below is the example
>>>     query and a basic outline of the XML format.
>>>
>>>     select (r)
>>>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>>         l in stations.locationLabels
>>>     where text(stations.id <http://stations.id>) = text(sensors.station)
>>>         and text(sensors.date) = "1976-07-04T00:00:00.000"
>>>         and text(l.displayName) = "WASHINGTON";
>>>
>>>     <data>
>>>       <date></date>
>>>       <dataType></dataType>
>>>       <station></station>
>>>       <value></value>
>>>     </data>
>>>
>>>     <station>
>>>       <id></id>
>>>       <locationLabels>
>>>         <id></id>
>>>         <displayName></displayName>
>>>       </locationLabels>
>>>       <!-- multiple locationLables -->
>>>     </station>
>>>
>>>
>>>     1) The following query only modifies the where parameters and
>>>     adds the aggregate function, yet I am getting a compile error.
>>>
>>>     min(
>>>         select (toInt(text(sensors.value)))
>>>         from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>>             stations in source(xml, "sample_xml/stations.xml",
>>>     {"station"}),
>>>             l in stations.locationLabels
>>>         where text(stations.id <http://stations.id>) =
>>>     text(sensors.station)
>>>             and toInt(substring(text(sensors.date), 0, 4)) = 2001
>>>             and text(sensors.dataType) = "TMIN"
>>>             and text(l.id <http://l.id>) = "FIPS:US"
>>>     ) / 10;
>>>
>>>     Here is the error I am getting. Any suggestions?
>>>
>>>     java.lang.ClassCastException: org.apache.mrql.MR_int cannot be
>>>     cast to org.apache.mrql.Bag
>>>             at
>>>     org.apache.mrql.MRQL_Lambda_5.eval(UserFunctions_0.java from
>>>     JavaSourceFromString:27)
>>>             at
>>>     org.apache.mrql.MapReduceAlgebra$1.hasNext(MapReduceAlgebra.java:59)
>>>             at
>>>     org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:292)
>>>             at
>>>     org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:184)
>>>             at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>>>             at
>>>     org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>>>             at
>>>     org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>>>             at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>             at java.security.AccessController.doPrivileged(Native
>>>     Method)
>>>             at javax.security.auth.Subject.doAs(Subject.java:396)
>>>             at
>>>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
>>>             at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>>
>>>
>>>
>>>     2) Also I have one more query that again just a few changes to
>>>     the where clause and the return statement is a bag of results.
>>>
>>>     select (n, d, v)
>>>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>>         d in sensors.date,
>>>         v in sensors.value,
>>>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>>         n in stations.displayName,
>>>         l in stations.locationLabels
>>>     where text(stations.id <http://stations.id>) =
>>>     text(sensors.station)
>>>         and toInt(substring(text(d), 0, 4)) = 2000
>>>         and text(sensors.dataType) = "TMAX"
>>>         and text(l.displayName) = "WASHINGTON";
>>>
>>>     Produces the following error message.
>>>
>>>     Apache MRQL version 0.9.2 (compiled distributed Hadoop MapReduce
>>>     mode with 5 reducers)
>>>     Query type: !bag(( XML, XML, XML ))
>>>     *** MRQL error at line 12: wrong projection: project(x_27,value)
>>>     (Node(tuple(string,bag(tuple(string,string)),list(XML))))
>>>
>>>     Thanks for your feedback.
>>
>>
>


Re: Query Error

Posted by Leonidas Fegaras <fe...@cse.uta.edu>.
Hi Preston,
I think the best way to resolve this issue is to upload your data and 
your queries on some site (such as dropbox) and give me permission to 
download them so I can recreate your bug. I have tried XML queries on 
DBLP data which is few 100MBs (see queries/dblp-pagerank.mrql). Even if 
XML elements cross block boundaries, the XML input format should handle 
them correctly. The XML parser though is very simple. One possibility is 
that there was an XML element that was not properly closed or it may 
contained unusual chars.
Leonidas


On 07/28/2014 11:54 PM, Eldon Carman wrote:
> So I finally had some time to look at MRQL again. :-)
>
> The patch did not fix my issue. The error does not occur on small XML 
> (less than 64MB). The error occurs only when the file is large enough 
> to be split up into multiple chunks on HDFS. Does this help identify 
> the issue. It seems like its related to merging data together after 
> parsing during the map operation.
>
> Thanks,
> Preston
>
> On Tue, Jul 8, 2014 at 7:52 AM, Leonidas Fegaras <fegaras@cse.uta.edu 
> <ma...@cse.uta.edu>> wrote:
>
>     Hi Eldon,
>     I couldn't recreate exactly your errors but I found one genuine
>     bug in the code generation for XML projections, which is solved
>     with the attached patch.
>     I used the following data and queries:
>
>     sample_xml/sensors.xml:
>     <data>
>       <date>2001/2/2</date>
>       <dataType>TMIN</dataType>
>       <station>xyz</station>
>       <value>35</value>
>     </data>
>
>     sample_xml/stations.xml:
>     <station>
>       <id>xyz</id>
>       <locationLabels>
>         <id>FIPS:US</id>
>         <displayName>WASHINGTON</displayName>
>       </locationLabels>
>     </station>
>
>     Q1:
>
>     min(
>         select (toInt(text(sensors.value)))
>         from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>             stations in source(xml, "sample_xml/stations.xml",
>     {"station"}),
>             l in stations.locationLabels
>         where text(stations.id <http://stations.id>) =
>     text(sensors.station)
>             and toInt(substring(text(sensors.date), 0, 4)) = 2001
>             and text(sensors.dataType) = "TMIN"
>             and text(l.id <http://l.id>) = "FIPS:US"
>     ) / 10;
>
>     Q2:
>
>     select (n, d, v)
>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>         d in sensors.date,
>         v in sensors.value,
>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>         l in stations.locationLabels,
>         n in l.displayName
>     where text(stations.id <http://stations.id>) = text(sensors.station)
>         and toInt(substring(text(d), 0, 4)) = 2001
>         and text(sensors.dataType) = "TMIN"
>         and text(l.displayName) = "WASHINGTON";
>
>     When I run Q1, it gives 3.
>     Q2 gives {
>     (<displayName>WASHINGTON</displayName>,<date>2001/2/2</date>,<value>35</value>)
>     }
>
>     Leonidas
>
>
>
>     On 07/07/2014 04:43 PM, Eldon Carman wrote:
>>     Hi,
>>
>>     I have a two MRQL queries on XML that are giving me error
>>     messages. I have included a working MRQL query that I have used
>>     as a base to build the next two queries. Below is the example
>>     query and a basic outline of the XML format.
>>
>>     select (r)
>>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>         l in stations.locationLabels
>>     where text(stations.id <http://stations.id>) = text(sensors.station)
>>         and text(sensors.date) = "1976-07-04T00:00:00.000"
>>         and text(l.displayName) = "WASHINGTON";
>>
>>     <data>
>>       <date></date>
>>       <dataType></dataType>
>>       <station></station>
>>       <value></value>
>>     </data>
>>
>>     <station>
>>       <id></id>
>>       <locationLabels>
>>         <id></id>
>>         <displayName></displayName>
>>       </locationLabels>
>>       <!-- multiple locationLables -->
>>     </station>
>>
>>
>>     1) The following query only modifies the where parameters and
>>     adds the aggregate function, yet I am getting a compile error.
>>
>>     min(
>>         select (toInt(text(sensors.value)))
>>         from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>             stations in source(xml, "sample_xml/stations.xml",
>>     {"station"}),
>>             l in stations.locationLabels
>>         where text(stations.id <http://stations.id>) =
>>     text(sensors.station)
>>             and toInt(substring(text(sensors.date), 0, 4)) = 2001
>>             and text(sensors.dataType) = "TMIN"
>>             and text(l.id <http://l.id>) = "FIPS:US"
>>     ) / 10;
>>
>>     Here is the error I am getting. Any suggestions?
>>
>>     java.lang.ClassCastException: org.apache.mrql.MR_int cannot be
>>     cast to org.apache.mrql.Bag
>>             at
>>     org.apache.mrql.MRQL_Lambda_5.eval(UserFunctions_0.java from
>>     JavaSourceFromString:27)
>>             at
>>     org.apache.mrql.MapReduceAlgebra$1.hasNext(MapReduceAlgebra.java:59)
>>             at
>>     org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:292)
>>             at
>>     org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:184)
>>             at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>>             at
>>     org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>>             at
>>     org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>>             at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>             at java.security.AccessController.doPrivileged(Native Method)
>>             at javax.security.auth.Subject.doAs(Subject.java:396)
>>             at
>>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
>>             at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>
>>
>>
>>     2) Also I have one more query that again just a few changes to
>>     the where clause and the return statement is a bag of results.
>>
>>     select (n, d, v)
>>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>>         d in sensors.date,
>>         v in sensors.value,
>>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>>         n in stations.displayName,
>>         l in stations.locationLabels
>>     where text(stations.id <http://stations.id>) = text(sensors.station)
>>         and toInt(substring(text(d), 0, 4)) = 2000
>>         and text(sensors.dataType) = "TMAX"
>>         and text(l.displayName) = "WASHINGTON";
>>
>>     Produces the following error message.
>>
>>     Apache MRQL version 0.9.2 (compiled distributed Hadoop MapReduce
>>     mode with 5 reducers)
>>     Query type: !bag(( XML, XML, XML ))
>>     *** MRQL error at line 12: wrong projection: project(x_27,value)
>>     (Node(tuple(string,bag(tuple(string,string)),list(XML))))
>>
>>     Thanks for your feedback.
>
>


Re: Query Error

Posted by Eldon Carman <ec...@ucr.edu>.
So I finally had some time to look at MRQL again. :-)

The patch did not fix my issue. The error does not occur on small XML (less
than 64MB). The error occurs only when the file is large enough to be split
up into multiple chunks on HDFS. Does this help identify the issue. It
seems like its related to merging data together after parsing during the
map operation.

Thanks,
Preston

On Tue, Jul 8, 2014 at 7:52 AM, Leonidas Fegaras <fe...@cse.uta.edu>
wrote:

>  Hi Eldon,
> I couldn't recreate exactly your errors but I found one genuine bug in the
> code generation for XML projections, which is solved with the attached
> patch.
> I used the following data and queries:
>
> sample_xml/sensors.xml:
> <data>
>   <date>2001/2/2</date>
>   <dataType>TMIN</dataType>
>   <station>xyz</station>
>   <value>35</value>
> </data>
>
> sample_xml/stations.xml:
> <station>
>   <id>xyz</id>
>   <locationLabels>
>     <id>FIPS:US</id>
>     <displayName>WASHINGTON</displayName>
>   </locationLabels>
> </station>
>
> Q1:
>
> min(
>     select (toInt(text(sensors.value)))
>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>         l in stations.locationLabels
>     where text(stations.id) = text(sensors.station)
>         and toInt(substring(text(sensors.date), 0, 4)) = 2001
>         and text(sensors.dataType) = "TMIN"
>         and text(l.id) = "FIPS:US"
> ) / 10;
>
> Q2:
>
> select (n, d, v)
> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>     d in sensors.date,
>     v in sensors.value,
>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>     l in stations.locationLabels,
>     n in l.displayName
> where text(stations.id) = text(sensors.station)
>     and toInt(substring(text(d), 0, 4)) = 2001
>     and text(sensors.dataType) = "TMIN"
>     and text(l.displayName) = "WASHINGTON";
>
> When I run Q1, it gives 3.
> Q2 gives {
> (<displayName>WASHINGTON</displayName>,<date>2001/2/2</date>,<value>35</value>)
> }
>
> Leonidas
>
>
>
> On 07/07/2014 04:43 PM, Eldon Carman wrote:
>
> Hi,
>
> I have a two MRQL queries on XML that are giving me error messages. I have
> included a working MRQL query that I have used as a base to build the next
> two queries. Below is the example query and a basic outline of the XML
> format.
>
> select (r)
> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>     l in stations.locationLabels
> where text(stations.id) = text(sensors.station)
>     and text(sensors.date) = "1976-07-04T00:00:00.000"
>     and text(l.displayName) = "WASHINGTON";
>
> <data>
>   <date></date>
>   <dataType></dataType>
>   <station></station>
>   <value></value>
> </data>
>
> <station>
>   <id></id>
>   <locationLabels>
>     <id></id>
>     <displayName></displayName>
>   </locationLabels>
>   <!-- multiple locationLables -->
> </station>
>
>
>
> 1) The following query only modifies the where parameters and adds the
> aggregate function, yet I am getting a compile error.
>
> min(
>     select (toInt(text(sensors.value)))
>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>         l in stations.locationLabels
>     where text(stations.id) = text(sensors.station)
>         and toInt(substring(text(sensors.date), 0, 4)) = 2001
>         and text(sensors.dataType) = "TMIN"
>         and text(l.id) = "FIPS:US"
> ) / 10;
>
> Here is the error I am getting. Any suggestions?
>
> java.lang.ClassCastException: org.apache.mrql.MR_int cannot be cast to
> org.apache.mrql.Bag
>         at org.apache.mrql.MRQL_Lambda_5.eval(UserFunctions_0.java from
> JavaSourceFromString:27)
>         at
> org.apache.mrql.MapReduceAlgebra$1.hasNext(MapReduceAlgebra.java:59)
>         at
> org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:292)
>         at
> org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:184)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
>
>
> 2) Also I have one more query that again just a few changes to the where
> clause and the return statement is a bag of results.
>
> select (n, d, v)
> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>     d in sensors.date,
>     v in sensors.value,
>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>     n in stations.displayName,
>     l in stations.locationLabels
> where text(stations.id) = text(sensors.station)
>     and toInt(substring(text(d), 0, 4)) = 2000
>     and text(sensors.dataType) = "TMAX"
>     and text(l.displayName) = "WASHINGTON";
>
> Produces the following error message.
>
> Apache MRQL version 0.9.2 (compiled distributed Hadoop MapReduce mode with
> 5 reducers)
> Query type: !bag(( XML, XML, XML ))
> *** MRQL error at line 12: wrong projection: project(x_27,value)
> (Node(tuple(string,bag(tuple(string,string)),list(XML))))
>
>  Thanks for your feedback.
>
>
>

Re: Query Error

Posted by Leonidas Fegaras <fe...@cse.uta.edu>.
Hi Eldon,
I couldn't recreate exactly your errors but I found one genuine bug in 
the code generation for XML projections, which is solved with the 
attached patch.
I used the following data and queries:

sample_xml/sensors.xml:
<data>
   <date>2001/2/2</date>
   <dataType>TMIN</dataType>
   <station>xyz</station>
   <value>35</value>
</data>

sample_xml/stations.xml:
<station>
   <id>xyz</id>
   <locationLabels>
     <id>FIPS:US</id>
     <displayName>WASHINGTON</displayName>
   </locationLabels>
</station>

Q1:
min(
     select (toInt(text(sensors.value)))
     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
         stations in source(xml, "sample_xml/stations.xml", {"station"}),
         l in stations.locationLabels
     where text(stations.id) = text(sensors.station)
         and toInt(substring(text(sensors.date), 0, 4)) = 2001
         and text(sensors.dataType) = "TMIN"
         and text(l.id) = "FIPS:US"
) / 10;

Q2:
select (n, d, v)
from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
     d in sensors.date,
     v in sensors.value,
     stations in source(xml, "sample_xml/stations.xml", {"station"}),
     l in stations.locationLabels,
     n in l.displayName
where text(stations.id) = text(sensors.station)
     and toInt(substring(text(d), 0, 4)) = 2001
     and text(sensors.dataType) = "TMIN"
     and text(l.displayName) = "WASHINGTON";

When I run Q1, it gives 3.
Q2 gives { 
(<displayName>WASHINGTON</displayName>,<date>2001/2/2</date>,<value>35</value>) 
}

Leonidas


On 07/07/2014 04:43 PM, Eldon Carman wrote:
> Hi,
>
> I have a two MRQL queries on XML that are giving me error messages. I 
> have included a working MRQL query that I have used as a base to build 
> the next two queries. Below is the example query and a basic outline 
> of the XML format.
>
> select (r)
> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>     l in stations.locationLabels
> where text(stations.id <http://stations.id>) = text(sensors.station)
>     and text(sensors.date) = "1976-07-04T00:00:00.000"
>     and text(l.displayName) = "WASHINGTON";
>
> <data>
>   <date></date>
>   <dataType></dataType>
>   <station></station>
>   <value></value>
> </data>
>
> <station>
>   <id></id>
>   <locationLabels>
>     <id></id>
>     <displayName></displayName>
>   </locationLabels>
>   <!-- multiple locationLables -->
> </station>
>
>
> 1) The following query only modifies the where parameters and adds the 
> aggregate function, yet I am getting a compile error.
>
> min(
>     select (toInt(text(sensors.value)))
>     from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>         stations in source(xml, "sample_xml/stations.xml", {"station"}),
>         l in stations.locationLabels
>     where text(stations.id <http://stations.id>) = text(sensors.station)
>         and toInt(substring(text(sensors.date), 0, 4)) = 2001
>         and text(sensors.dataType) = "TMIN"
>         and text(l.id <http://l.id>) = "FIPS:US"
> ) / 10;
>
> Here is the error I am getting. Any suggestions?
>
> java.lang.ClassCastException: org.apache.mrql.MR_int cannot be cast to 
> org.apache.mrql.Bag
>         at org.apache.mrql.MRQL_Lambda_5.eval(UserFunctions_0.java 
> from JavaSourceFromString:27)
>         at 
> org.apache.mrql.MapReduceAlgebra$1.hasNext(MapReduceAlgebra.java:59)
>         at 
> org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:292)
>         at 
> org.apache.mrql.JoinOperation$JoinReducer.reduce(JoinOperation.java:184)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
>         at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
>
>
> 2) Also I have one more query that again just a few changes to the 
> where clause and the return statement is a bag of results.
>
> select (n, d, v)
> from sensors in source(xml, "sample_xml/sensors.xml", {"data"}),
>     d in sensors.date,
>     v in sensors.value,
>     stations in source(xml, "sample_xml/stations.xml", {"station"}),
>     n in stations.displayName,
>     l in stations.locationLabels
> where text(stations.id <http://stations.id>) = text(sensors.station)
>     and toInt(substring(text(d), 0, 4)) = 2000
>     and text(sensors.dataType) = "TMAX"
>     and text(l.displayName) = "WASHINGTON";
>
> Produces the following error message.
>
> Apache MRQL version 0.9.2 (compiled distributed Hadoop MapReduce mode 
> with 5 reducers)
> Query type: !bag(( XML, XML, XML ))
> *** MRQL error at line 12: wrong projection: project(x_27,value) 
> (Node(tuple(string,bag(tuple(string,string)),list(XML))))
>
> Thanks for your feedback.