You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Mike Carey <dt...@gmail.com> on 2018/01/29 00:30:35 UTC

Re: Hyracks Job Requirement Configuration

+ dev


On 1/28/18 3:37 PM, Rana Alotaibi wrote:
> Hi all,
>
> I would like to make AsterixDB utilizes all available CPU cores (39) 
> that I have for the following query:
>
> USE mimiciii;
> SET `compiler.parallelism` "39";
> SET `compiler.sortmemory` "128MB";
> SET `compiler.joinmemory` "265MB";
> SELECT P.SUBJECT_ID
> FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
> WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>              E.FLAG = 'abnormal' AND
>              I.FLUID='Blood' AND
>              I.LABEL='Haptoglobin'
>
>
> The total memory size that I have is 125GB(57GB for the AsterixDB 
> buffer cache). By running the above query, I got the following error:
>
> "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU 
> cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
>
> How can I change this capacity default configuration? I'm looking into 
> this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html . 
> Could you please point me to the appropriate configuration parameter?
>
> Thanks
> -- Rana
>
>
>
>


Re: Hyracks Job Requirement Configuration

Posted by Michael Carey <mj...@ics.uci.edu>.
Do I hear 30?  (And ~1.6 mins?)


On 1/29/18 12:28 AM, Rana Alotaibi wrote:
> Thanks Murtadha for your informative email. I have now 15 partitions 
> (~15 cores were utilized as well), and it helps to reduce the 
> execution time. The query execution time now is ~3.2 mins :).
>
> --Rana
>
>
>
> On Sun, Jan 28, 2018 at 8:29 PM, Murtadha Hubail <hubailmor@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     If reloading the data isn’t too much trouble, the first thing I
>     would do is recreate the instance with more partitions (e.g.
>     partition per core or partition per 2 cores) and check the cores
>     utilization. If this is the same dataset as the one in your
>     previous email, you mentioned that it was about 10GB per
>     partition, in that case, you might want to allocate at least 40GB
>     for the buffer cache and you can reduce
>     storage.memorycomponent.globalbudget to get enough memory to
>     execute the job (depending on the number of partitions you
>     create). After recreating with higher number of partitions, don’t
>     use “SET `compiler.parallelism` "39"”. It will automatically use
>     the number of partitions you create.
>
>     Regarding the metrics time, it includes the results printing time,
>     so if you want to see if it has any impact, try adding “limit 1”
>     at the end of your query or change it to select count(*) instead
>     of subject_id.
>
>     Cheers,
>
>     Murtadha
>
>     *From: *Rana Alotaibi <ralotaib@eng.ucsd.edu
>     <ma...@eng.ucsd.edu>>
>     *Date: *Monday, 29 January 2018 at 6:48 AM
>
>
>     *To: *<hubailmor@gmail.com <ma...@gmail.com>>
>     *Cc: *<users@asterixdb.apache.org
>     <ma...@asterixdb.apache.org>>, <dev@asterixdb.apache.org
>     <ma...@asterixdb.apache.org>>
>     *Subject: *Re: Hyracks Job Requirement Configuration
>
>     *- Do you see all cores being fully utilized during the query
>     execution? *
>
>     **I have noticed only 6 cores were utilized
>     *- How much time does the query take right now and how do you
>     measure the query execution time? Do you wait for the result to be
>     printed somewhere (e.g. in the browser)?*
>
>     I'm using the HTTP APIs. The response is a JSON object that
>     includes the query execution time:
>
>        { "status": "success",
>             "metrics": {
>     *"elapsedTime": "434.627299814s",
>                     "executionTime": "434.626137977s",*
>                     "resultCount": 4943,
>                     "resultSize": 132293,-
>                     "processedObjects": 46875
>             }
>     }
>
>     I run the query 10 times and took the average which is ~6mins.
>
>     *- You mentioned that you have 4 partitions, how many physical
>     hard drives are they mapped to?*
>
>     **One physical hard drive
>
>     *- Also, increasing the sort/join memory doesn’t necessarily lead
>     to a better performance. Have you tried changing these values to
>     something smaller and seeing the effects?*
>
>       Yes, I tried the following numbers:
>
>       1) sort-memory: 32MB, join-memory: 64MB
>
>       2) sort-memory: 64MB, join-memory: 128MB
>
>       3) sort-memory: 128MB, join-memory: 265MB
>
>     The execution time remains on average ~6 - 6.5mins. I didn't see
>     any improvement. The configurations that I have now:
>
>     - compiler.parallelism :39 //Only 6 were utilized
>
>     - storage.buffercache.size: 20GB
>
>     - storage.buffercache.pagesize: 1MB
>
>     Thanks,
>
>     Rana
>
>     On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail
>     <hu...@gmail.com>> wrote:
>
>         I have few questions if you don’t mind:
>
>         Do you see all cores being fully utilized during the query
>         execution?
>
>         How much time does the query take right now and how do you
>         measure the query execution time? Do you wait for the result
>         to be printed somewhere (e.g. in the browser)?
>
>         You mentioned that you have 4 partitions, how many physical
>         hard drives are they mapped to?
>
>         Also, increasing the sort/join memory doesn’t necessarily lead
>         to a better performance. Have you tried changing these values
>         to something smaller and seeing the effects?
>
>         Cheers,
>
>         Murtadha
>
>         *From: *Rana Alotaibi
>         <ra...@eng.ucsd.edu>>
>         *Date: *Monday, 29 January 2018 at 5:21 AM
>         *To: *<hu...@gmail.com>>
>         *Cc:
>         *<us...@asterixdb.apache.org>>,
>         <de...@asterixdb.apache.org>>
>         *Subject: *Re: Hyracks Job Requirement Configuration
>
>         Thanks Murtadha! The problem solved. However, increasing the
>         number of cores didn't help to improve the performance of that
>         query.
>
>         On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail
>         <hu...@gmail.com>> wrote:
>
>             Hi Rana,
>
>             The memory used for query processing is automatically
>             calculated as follows:
>             JVM Max Memory - storage.buffercache.size -
>             storage.memorycomponent.globalbudget
>
>             The documentation defaults for these parameters are
>             outdated. The default value for storage.buffercache.size
>             is (JVM Max Memory / 4) and it's the same for
>             storage.memorycomponent.globalbudget. Since your dataset
>             is already loaded, you could reduce the budget of
>             storage.memorycomponent.globalbudget. In addition, if I
>             recall correctly, your dataset size is way smaller than
>             what's allocated for the buffer cache, so you might want
>             to reduce the buffer cache budget. That should give you
>             more than enough memory to execute on 39 cores.
>
>             Cheers,
>             Murtadha
>
>
>             On 01/29/2018, 3:30 AM, "Mike Carey"
>             <dt...@gmail.com>> wrote:
>
>                 + dev
>
>
>                 On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>                 > Hi all,
>                 >
>                 > I would like to make AsterixDB utilizes all
>             available CPU cores (39)
>                 > that I have for the following query:
>                 >
>                 > USE mimiciii;
>                 > SET `compiler.parallelism` "39";
>                 > SET `compiler.sortmemory` "128MB";
>                 > SET `compiler.joinmemory` "265MB";
>                 > SELECT P.SUBJECT_ID
>                 > FROM  LABITEMS I, PATIENTS P, P.ADMISSIONS A,
>             A.LABEVENTS E
>                 > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>                 >         E.FLAG = 'abnormal' AND
>                 > I.FLUID='Blood' AND
>                 > I.LABEL='Haptoglobin'
>                 >
>                 >
>                 > The total memory size that I have is 125GB(57GB for
>             the AsterixDB
>                 > buffer cache). By running the above query, I got the
>             following error:
>                 >
>                 > "msg": "HYR0009: Job requirement (memory:
>             10705403904 bytes, CPU
>                 > cores: 39) exceeds capacity (memory:
>             3258744832<tel:%28325%29%20874-4832>bytes, CPU cores: 39)"
>                 >
>                 > How can I change this capacity default
>             configuration? I'm looking into
>                 > this page :
>             https://asterixdb.apache.org/docs/0.9.2/ncservice.html<https://asterixdb.apache.org/docs/0.9.2/ncservice.html>.
>                 > Could you please point me to the appropriate
>             configuration parameter?
>                 >
>                 > Thanks
>                 > -- Rana
>                 >
>                 >
>                 >
>                 >
>
>
>


Re: Hyracks Job Requirement Configuration

Posted by Rana Alotaibi <ra...@eng.ucsd.edu>.
Thanks Murtadha for your informative email. I have now 15 partitions (~15
cores were utilized as well), and it helps to reduce the execution time.
The query execution time now is ~3.2 mins :).

--Rana



On Sun, Jan 28, 2018 at 8:29 PM, Murtadha Hubail <hu...@gmail.com>
wrote:

> If reloading the data isn’t too much trouble, the first thing I would do
> is recreate the instance with more partitions (e.g. partition per core or
> partition per 2 cores) and check the cores utilization. If this is the same
> dataset as the one in your previous email, you mentioned that it was about
> 10GB per partition, in that case, you might want to allocate at least 40GB
> for the buffer cache and you can reduce storage.memorycomponent.globalbudget
> to get enough memory to execute the job (depending on the number of
> partitions you create). After recreating with higher number of partitions,
> don’t use “SET `compiler.parallelism` "39"”. It will automatically use the
> number of partitions you create.
>
>
>
> Regarding the metrics time, it includes the results printing time, so if
> you want to see if it has any impact, try adding “limit 1” at the end of
> your query or change it to select count(*) instead of subject_id.
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 6:48 AM
>
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> *- Do you see all cores being fully utilized during the query execution? *
>
>  I have noticed only 6 cores were utilized
> *- How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?*
>
> I'm using the HTTP APIs. The response is a JSON object that includes the
> query execution time:
>
>    { "status": "success",
>         "metrics": {
>
> * "elapsedTime": "434.627299814s",                "executionTime":
> "434.626137977s",*
>                 "resultCount": 4943,
>                 "resultSize": 132293,-
>                 "processedObjects": 46875
>         }
> }
>
> I run the query 10 times and took the average which is ~6mins.
>
> *- You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?*
>
>  One physical hard drive
>
> *- Also, increasing the sort/join memory doesn’t necessarily lead to a
> better performance. Have you tried changing these values to something
> smaller and seeing the effects?*
>
>   Yes, I tried the following numbers:
>
>   1) sort-memory: 32MB, join-memory: 64MB
>
>   2) sort-memory: 64MB, join-memory: 128MB
>
>   3) sort-memory: 128MB, join-memory:  265MB
>
>
>
> The execution time remains on average ~6 - 6.5mins. I didn't see any
> improvement. The configurations that I have now:
>
> - compiler.parallelism :39 //Only 6 were utilized
>
> - storage.buffercache.size: 20GB
>
> - storage.buffercache.pagesize: 1MB
>
>
>
> Thanks,
>
> Rana
>
> On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <hu...@gmail.com>
> wrote:
>
> I have few questions if you don’t mind:
>
> Do you see all cores being fully utilized during the query execution?
>
> How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?
>
> You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?
>
> Also, increasing the sort/join memory doesn’t necessarily lead to a better
> performance. Have you tried changing these values to something smaller and
> seeing the effects?
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 5:21 AM
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> Thanks Murtadha! The problem solved. However, increasing the number of
> cores didn't help to improve the performance of that query.
>
> On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com>
> wrote:
>
> Hi Rana,
>
> The memory used for query processing is automatically calculated as
> follows:
> JVM Max Memory - storage.buffercache.size - storage.memorycomponent.
> globalbudget
>
> The documentation defaults for these parameters are outdated. The default
> value for storage.buffercache.size is (JVM Max Memory / 4) and it's the
> same for storage.memorycomponent.globalbudget. Since your dataset is
> already loaded, you could reduce the budget of storage.memorycomponent.globalbudget.
> In addition, if I recall correctly, your dataset size is way smaller than
> what's allocated for the buffer cache, so you might want to reduce the
> buffer cache budget. That should give you more than enough memory to
> execute on 39 cores.
>
> Cheers,
> Murtadha
>
>
> On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:
>
>     + dev
>
>
>     On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>     > Hi all,
>     >
>     > I would like to make AsterixDB utilizes all available CPU cores (39)
>     > that I have for the following query:
>     >
>     > USE mimiciii;
>     > SET `compiler.parallelism` "39";
>     > SET `compiler.sortmemory` "128MB";
>     > SET `compiler.joinmemory` "265MB";
>     > SELECT P.SUBJECT_ID
>     > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>     > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>     >              E.FLAG = 'abnormal' AND
>     >              I.FLUID='Blood' AND
>     >              I.LABEL='Haptoglobin'
>     >
>     >
>     > The total memory size that I have is 125GB(57GB for the AsterixDB
>     > buffer cache). By running the above query, I got the following error:
>     >
>     > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
>     > cores: 39) exceeds capacity (memory: 3258744832 <(325)%20874-4832>
> bytes, CPU cores: 39)"
>     >
>     > How can I change this capacity default configuration? I'm looking
> into
>     > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
>     > Could you please point me to the appropriate configuration parameter?
>     >
>     > Thanks
>     > -- Rana
>     >
>     >
>     >
>     >
>
>
>
>
>
>

Re: Hyracks Job Requirement Configuration

Posted by Michael Carey <mj...@ics.uci.edu>.
Rana's work shows a clear user requirement (@Xikui pay attention :-)) -- 
we need two forms of parallelism hint, one that does what we currently 
do - which is widen the parallelism AFTER reading from storage at the 
first opportunity to do so - and another that widens it IMMEDIATELY 
(somehow :-)). The latter is clearly what Rana would ideally have been 
able to make use of, so she wouldn't have to change the data layout to 
get more parallelism. Food for thought.


On 1/28/18 8:29 PM, Murtadha Hubail wrote:
>
> If reloading the data isn’t too much trouble, the first thing I would 
> do is recreate the instance with more partitions (e.g. partition per 
> core or partition per 2 cores) and check the cores utilization. If 
> this is the same dataset as the one in your previous email, you 
> mentioned that it was about 10GB per partition, in that case, you 
> might want to allocate at least 40GB for the buffer cache and you can 
> reduce storage.memorycomponent.globalbudget to get enough memory to 
> execute the job (depending on the number of partitions you create). 
> After recreating with higher number of partitions, don’t use “SET 
> `compiler.parallelism` "39"”. It will automatically use the number of 
> partitions you create.
>
> Regarding the metrics time, it includes the results printing time, so 
> if you want to see if it has any impact, try adding “limit 1” at the 
> end of your query or change it to select count(*) instead of subject_id.
>
> Cheers,
>
> Murtadha
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 6:48 AM
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
> *- Do you see all cores being fully utilized during the query execution? *
>
> **I have noticed only 6 cores were utilized
> *- How much time does the query take right now and how do you measure 
> the query execution time? Do you wait for the result to be printed 
> somewhere (e.g. in the browser)?*
>
> I'm using the HTTP APIs. The response is a JSON object that includes 
> the query execution time:
>
>    { "status": "success",
>         "metrics": {
> *"elapsedTime": "434.627299814s",
>                 "executionTime": "434.626137977s",*
>                 "resultCount": 4943,
>                 "resultSize": 132293,-
>                 "processedObjects": 46875
>         }
> }
>
> I run the query 10 times and took the average which is ~6mins.
>
> *- You mentioned that you have 4 partitions, how many physical hard 
> drives are they mapped to?*
>
> **One physical hard drive
>
> *- Also, increasing the sort/join memory doesn’t necessarily lead to a 
> better performance. Have you tried changing these values to something 
> smaller and seeing the effects?*
>
>   Yes, I tried the following numbers:
>
>   1) sort-memory: 32MB, join-memory: 64MB
>
>   2) sort-memory: 64MB, join-memory: 128MB
>
>   3) sort-memory: 128MB, join-memory:  265MB
>
> The execution time remains on average ~6 - 6.5mins. I didn't see any 
> improvement. The configurations that I have now:
>
> - compiler.parallelism :39 //Only 6 were utilized
>
> - storage.buffercache.size: 20GB
>
> - storage.buffercache.pagesize: 1MB
>
> Thanks,
>
> Rana
>
> On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail 
> <hu...@gmail.com>> wrote:
>
>     I have few questions if you don’t mind:
>
>     Do you see all cores being fully utilized during the query execution?
>
>     How much time does the query take right now and how do you measure
>     the query execution time? Do you wait for the result to be printed
>     somewhere (e.g. in the browser)?
>
>     You mentioned that you have 4 partitions, how many physical hard
>     drives are they mapped to?
>
>     Also, increasing the sort/join memory doesn’t necessarily lead to
>     a better performance. Have you tried changing these values to
>     something smaller and seeing the effects?
>
>     Cheers,
>
>     Murtadha
>
>     *From: *Rana Alotaibi
>     <ra...@eng.ucsd.edu>>
>     *Date: *Monday, 29 January 2018 at 5:21 AM
>     *To: *<hu...@gmail.com>>
>     *Cc:
>     *<us...@asterixdb.apache.org>>,
>     <de...@asterixdb.apache.org>>
>     *Subject: *Re: Hyracks Job Requirement Configuration
>
>     Thanks Murtadha! The problem solved. However, increasing the
>     number of cores didn't help to improve the performance of that query.
>
>     On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail
>     <hu...@gmail.com>> wrote:
>
>         Hi Rana,
>
>         The memory used for query processing is automatically
>         calculated as follows:
>         JVM Max Memory - storage.buffercache.size -
>         storage.memorycomponent.globalbudget
>
>         The documentation defaults for these parameters are outdated.
>         The default value for storage.buffercache.size is (JVM Max
>         Memory / 4) and it's the same for
>         storage.memorycomponent.globalbudget. Since your dataset is
>         already loaded, you could reduce the budget of
>         storage.memorycomponent.globalbudget. In addition, if I recall
>         correctly, your dataset size is way smaller than what's
>         allocated for the buffer cache, so you might want to reduce
>         the buffer cache budget. That should give you more than enough
>         memory to execute on 39 cores.
>
>         Cheers,
>         Murtadha
>
>
>         On 01/29/2018, 3:30 AM, "Mike Carey"
>         <dt...@gmail.com>> wrote:
>
>             + dev
>
>
>             On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>             > Hi all,
>             >
>             > I would like to make AsterixDB utilizes all available
>         CPU cores (39)
>             > that I have for the following query:
>             >
>             > USE mimiciii;
>             > SET `compiler.parallelism` "39";
>             > SET `compiler.sortmemory` "128MB";
>             > SET `compiler.joinmemory` "265MB";
>             > SELECT P.SUBJECT_ID
>             > FROM  LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>             > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>             >   E.FLAG = 'abnormal' AND
>             >   I.FLUID='Blood' AND
>             > I.LABEL='Haptoglobin'
>             >
>             >
>             > The total memory size that I have is 125GB(57GB for the
>         AsterixDB
>             > buffer cache). By running the above query, I got the
>         following error:
>             >
>             > "msg": "HYR0009: Job requirement (memory: 10705403904
>         bytes, CPU
>             > cores: 39) exceeds capacity (memory:
>         3258744832<tel:%28325%29%20874-4832>bytes, CPU cores: 39)"
>             >
>             > How can I change this capacity default configuration?
>         I'm looking into
>             > this page :
>         https://asterixdb.apache.org/docs/0.9.2/ncservice.html.
>             > Could you please point me to the appropriate
>         configuration parameter?
>             >
>             > Thanks
>             > -- Rana
>             >
>             >
>             >
>             >
>
>


Re: Hyracks Job Requirement Configuration

Posted by Michael Carey <mj...@ics.uci.edu>.
Rana's work shows a clear user requirement (@Xikui pay attention :-)) -- 
we need two forms of parallelism hint, one that does what we currently 
do - which is widen the parallelism AFTER reading from storage at the 
first opportunity to do so - and another that widens it IMMEDIATELY 
(somehow :-)). The latter is clearly what Rana would ideally have been 
able to make use of, so she wouldn't have to change the data layout to 
get more parallelism. Food for thought.


On 1/28/18 8:29 PM, Murtadha Hubail wrote:
>
> If reloading the data isn’t too much trouble, the first thing I would 
> do is recreate the instance with more partitions (e.g. partition per 
> core or partition per 2 cores) and check the cores utilization. If 
> this is the same dataset as the one in your previous email, you 
> mentioned that it was about 10GB per partition, in that case, you 
> might want to allocate at least 40GB for the buffer cache and you can 
> reduce storage.memorycomponent.globalbudget to get enough memory to 
> execute the job (depending on the number of partitions you create). 
> After recreating with higher number of partitions, don’t use “SET 
> `compiler.parallelism` "39"”. It will automatically use the number of 
> partitions you create.
>
> Regarding the metrics time, it includes the results printing time, so 
> if you want to see if it has any impact, try adding “limit 1” at the 
> end of your query or change it to select count(*) instead of subject_id.
>
> Cheers,
>
> Murtadha
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 6:48 AM
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
> *- Do you see all cores being fully utilized during the query execution? *
>
> **I have noticed only 6 cores were utilized
> *- How much time does the query take right now and how do you measure 
> the query execution time? Do you wait for the result to be printed 
> somewhere (e.g. in the browser)?*
>
> I'm using the HTTP APIs. The response is a JSON object that includes 
> the query execution time:
>
>    { "status": "success",
>         "metrics": {
> *"elapsedTime": "434.627299814s",
>                 "executionTime": "434.626137977s",*
>                 "resultCount": 4943,
>                 "resultSize": 132293,-
>                 "processedObjects": 46875
>         }
> }
>
> I run the query 10 times and took the average which is ~6mins.
>
> *- You mentioned that you have 4 partitions, how many physical hard 
> drives are they mapped to?*
>
> **One physical hard drive
>
> *- Also, increasing the sort/join memory doesn’t necessarily lead to a 
> better performance. Have you tried changing these values to something 
> smaller and seeing the effects?*
>
>   Yes, I tried the following numbers:
>
>   1) sort-memory: 32MB, join-memory: 64MB
>
>   2) sort-memory: 64MB, join-memory: 128MB
>
>   3) sort-memory: 128MB, join-memory:  265MB
>
> The execution time remains on average ~6 - 6.5mins. I didn't see any 
> improvement. The configurations that I have now:
>
> - compiler.parallelism :39 //Only 6 were utilized
>
> - storage.buffercache.size: 20GB
>
> - storage.buffercache.pagesize: 1MB
>
> Thanks,
>
> Rana
>
> On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail 
> <hu...@gmail.com>> wrote:
>
>     I have few questions if you don’t mind:
>
>     Do you see all cores being fully utilized during the query execution?
>
>     How much time does the query take right now and how do you measure
>     the query execution time? Do you wait for the result to be printed
>     somewhere (e.g. in the browser)?
>
>     You mentioned that you have 4 partitions, how many physical hard
>     drives are they mapped to?
>
>     Also, increasing the sort/join memory doesn’t necessarily lead to
>     a better performance. Have you tried changing these values to
>     something smaller and seeing the effects?
>
>     Cheers,
>
>     Murtadha
>
>     *From: *Rana Alotaibi
>     <ra...@eng.ucsd.edu>>
>     *Date: *Monday, 29 January 2018 at 5:21 AM
>     *To: *<hu...@gmail.com>>
>     *Cc:
>     *<us...@asterixdb.apache.org>>,
>     <de...@asterixdb.apache.org>>
>     *Subject: *Re: Hyracks Job Requirement Configuration
>
>     Thanks Murtadha! The problem solved. However, increasing the
>     number of cores didn't help to improve the performance of that query.
>
>     On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail
>     <hu...@gmail.com>> wrote:
>
>         Hi Rana,
>
>         The memory used for query processing is automatically
>         calculated as follows:
>         JVM Max Memory - storage.buffercache.size -
>         storage.memorycomponent.globalbudget
>
>         The documentation defaults for these parameters are outdated.
>         The default value for storage.buffercache.size is (JVM Max
>         Memory / 4) and it's the same for
>         storage.memorycomponent.globalbudget. Since your dataset is
>         already loaded, you could reduce the budget of
>         storage.memorycomponent.globalbudget. In addition, if I recall
>         correctly, your dataset size is way smaller than what's
>         allocated for the buffer cache, so you might want to reduce
>         the buffer cache budget. That should give you more than enough
>         memory to execute on 39 cores.
>
>         Cheers,
>         Murtadha
>
>
>         On 01/29/2018, 3:30 AM, "Mike Carey"
>         <dt...@gmail.com>> wrote:
>
>             + dev
>
>
>             On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>             > Hi all,
>             >
>             > I would like to make AsterixDB utilizes all available
>         CPU cores (39)
>             > that I have for the following query:
>             >
>             > USE mimiciii;
>             > SET `compiler.parallelism` "39";
>             > SET `compiler.sortmemory` "128MB";
>             > SET `compiler.joinmemory` "265MB";
>             > SELECT P.SUBJECT_ID
>             > FROM  LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>             > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>             >   E.FLAG = 'abnormal' AND
>             >   I.FLUID='Blood' AND
>             > I.LABEL='Haptoglobin'
>             >
>             >
>             > The total memory size that I have is 125GB(57GB for the
>         AsterixDB
>             > buffer cache). By running the above query, I got the
>         following error:
>             >
>             > "msg": "HYR0009: Job requirement (memory: 10705403904
>         bytes, CPU
>             > cores: 39) exceeds capacity (memory:
>         3258744832<tel:%28325%29%20874-4832>bytes, CPU cores: 39)"
>             >
>             > How can I change this capacity default configuration?
>         I'm looking into
>             > this page :
>         https://asterixdb.apache.org/docs/0.9.2/ncservice.html.
>             > Could you please point me to the appropriate
>         configuration parameter?
>             >
>             > Thanks
>             > -- Rana
>             >
>             >
>             >
>             >
>
>


Re: Hyracks Job Requirement Configuration

Posted by Rana Alotaibi <ra...@eng.ucsd.edu>.
Thanks Murtadha for your informative email. I have now 15 partitions (~15
cores were utilized as well), and it helps to reduce the execution time.
The query execution time now is ~3.2 mins :).

--Rana



On Sun, Jan 28, 2018 at 8:29 PM, Murtadha Hubail <hu...@gmail.com>
wrote:

> If reloading the data isn’t too much trouble, the first thing I would do
> is recreate the instance with more partitions (e.g. partition per core or
> partition per 2 cores) and check the cores utilization. If this is the same
> dataset as the one in your previous email, you mentioned that it was about
> 10GB per partition, in that case, you might want to allocate at least 40GB
> for the buffer cache and you can reduce storage.memorycomponent.globalbudget
> to get enough memory to execute the job (depending on the number of
> partitions you create). After recreating with higher number of partitions,
> don’t use “SET `compiler.parallelism` "39"”. It will automatically use the
> number of partitions you create.
>
>
>
> Regarding the metrics time, it includes the results printing time, so if
> you want to see if it has any impact, try adding “limit 1” at the end of
> your query or change it to select count(*) instead of subject_id.
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 6:48 AM
>
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> *- Do you see all cores being fully utilized during the query execution? *
>
>  I have noticed only 6 cores were utilized
> *- How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?*
>
> I'm using the HTTP APIs. The response is a JSON object that includes the
> query execution time:
>
>    { "status": "success",
>         "metrics": {
>
> * "elapsedTime": "434.627299814s",                "executionTime":
> "434.626137977s",*
>                 "resultCount": 4943,
>                 "resultSize": 132293,-
>                 "processedObjects": 46875
>         }
> }
>
> I run the query 10 times and took the average which is ~6mins.
>
> *- You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?*
>
>  One physical hard drive
>
> *- Also, increasing the sort/join memory doesn’t necessarily lead to a
> better performance. Have you tried changing these values to something
> smaller and seeing the effects?*
>
>   Yes, I tried the following numbers:
>
>   1) sort-memory: 32MB, join-memory: 64MB
>
>   2) sort-memory: 64MB, join-memory: 128MB
>
>   3) sort-memory: 128MB, join-memory:  265MB
>
>
>
> The execution time remains on average ~6 - 6.5mins. I didn't see any
> improvement. The configurations that I have now:
>
> - compiler.parallelism :39 //Only 6 were utilized
>
> - storage.buffercache.size: 20GB
>
> - storage.buffercache.pagesize: 1MB
>
>
>
> Thanks,
>
> Rana
>
> On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <hu...@gmail.com>
> wrote:
>
> I have few questions if you don’t mind:
>
> Do you see all cores being fully utilized during the query execution?
>
> How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?
>
> You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?
>
> Also, increasing the sort/join memory doesn’t necessarily lead to a better
> performance. Have you tried changing these values to something smaller and
> seeing the effects?
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 5:21 AM
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> Thanks Murtadha! The problem solved. However, increasing the number of
> cores didn't help to improve the performance of that query.
>
> On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com>
> wrote:
>
> Hi Rana,
>
> The memory used for query processing is automatically calculated as
> follows:
> JVM Max Memory - storage.buffercache.size - storage.memorycomponent.
> globalbudget
>
> The documentation defaults for these parameters are outdated. The default
> value for storage.buffercache.size is (JVM Max Memory / 4) and it's the
> same for storage.memorycomponent.globalbudget. Since your dataset is
> already loaded, you could reduce the budget of storage.memorycomponent.globalbudget.
> In addition, if I recall correctly, your dataset size is way smaller than
> what's allocated for the buffer cache, so you might want to reduce the
> buffer cache budget. That should give you more than enough memory to
> execute on 39 cores.
>
> Cheers,
> Murtadha
>
>
> On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:
>
>     + dev
>
>
>     On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>     > Hi all,
>     >
>     > I would like to make AsterixDB utilizes all available CPU cores (39)
>     > that I have for the following query:
>     >
>     > USE mimiciii;
>     > SET `compiler.parallelism` "39";
>     > SET `compiler.sortmemory` "128MB";
>     > SET `compiler.joinmemory` "265MB";
>     > SELECT P.SUBJECT_ID
>     > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>     > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>     >              E.FLAG = 'abnormal' AND
>     >              I.FLUID='Blood' AND
>     >              I.LABEL='Haptoglobin'
>     >
>     >
>     > The total memory size that I have is 125GB(57GB for the AsterixDB
>     > buffer cache). By running the above query, I got the following error:
>     >
>     > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
>     > cores: 39) exceeds capacity (memory: 3258744832 <(325)%20874-4832>
> bytes, CPU cores: 39)"
>     >
>     > How can I change this capacity default configuration? I'm looking
> into
>     > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
>     > Could you please point me to the appropriate configuration parameter?
>     >
>     > Thanks
>     > -- Rana
>     >
>     >
>     >
>     >
>
>
>
>
>
>

Re: Hyracks Job Requirement Configuration

Posted by Murtadha Hubail <hu...@gmail.com>.
If reloading the data isn’t too much trouble, the first thing I would do is recreate the instance with more partitions (e.g. partition per core or partition per 2 cores) and check the cores utilization. If this is the same dataset as the one in your previous email, you mentioned that it was about 10GB per partition, in that case, you might want to allocate at least 40GB for the buffer cache and you can reduce storage.memorycomponent.globalbudget to get enough memory to execute the job (depending on the number of partitions you create). After recreating with higher number of partitions, don’t use “SET `compiler.parallelism` "39"”. It will automatically use the number of partitions you create.

 

Regarding the metrics time, it includes the results printing time, so if you want to see if it has any impact, try adding “limit 1” at the end of your query or change it to select count(*) instead of subject_id.

 

Cheers,

Murtadha

 

From: Rana Alotaibi <ra...@eng.ucsd.edu>
Date: Monday, 29 January 2018 at 6:48 AM
To: <hu...@gmail.com>
Cc: <us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
Subject: Re: Hyracks Job Requirement Configuration

 

- Do you see all cores being fully utilized during the query execution? 

 I have noticed only 6 cores were utilized
- How much time does the query take right now and how do you measure the query execution time? Do you wait for the result to be printed somewhere (e.g. in the browser)?

I'm using the HTTP APIs. The response is a JSON object that includes the query execution time:

   { "status": "success",
        "metrics": {
                "elapsedTime": "434.627299814s",
                "executionTime": "434.626137977s",
                "resultCount": 4943,
                "resultSize": 132293,- 
                "processedObjects": 46875
        }
}

I run the query 10 times and took the average which is ~6mins.

- You mentioned that you have 4 partitions, how many physical hard drives are they mapped to?

 One physical hard drive

- Also, increasing the sort/join memory doesn’t necessarily lead to a better performance. Have you tried changing these values to something smaller and seeing the effects?

  Yes, I tried the following numbers:

  1) sort-memory: 32MB, join-memory: 64MB

  2) sort-memory: 64MB, join-memory: 128MB

  3) sort-memory: 128MB, join-memory:  265MB

 

The execution time remains on average ~6 - 6.5mins. I didn't see any improvement. The configurations that I have now:

- compiler.parallelism :39 //Only 6 were utilized 

- storage.buffercache.size: 20GB

- storage.buffercache.pagesize: 1MB

 

Thanks,

Rana

On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <hu...@gmail.com> wrote:

I have few questions if you don’t mind:

Do you see all cores being fully utilized during the query execution? 

How much time does the query take right now and how do you measure the query execution time? Do you wait for the result to be printed somewhere (e.g. in the browser)?

You mentioned that you have 4 partitions, how many physical hard drives are they mapped to?

Also, increasing the sort/join memory doesn’t necessarily lead to a better performance. Have you tried changing these values to something smaller and seeing the effects?

 

Cheers,

Murtadha

 

From: Rana Alotaibi <ra...@eng.ucsd.edu>
Date: Monday, 29 January 2018 at 5:21 AM
To: <hu...@gmail.com>
Cc: <us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
Subject: Re: Hyracks Job Requirement Configuration

 

Thanks Murtadha! The problem solved. However, increasing the number of cores didn't help to improve the performance of that query.

On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com> wrote:

Hi Rana,

The memory used for query processing is automatically calculated as follows:
JVM Max Memory - storage.buffercache.size - storage.memorycomponent.globalbudget

The documentation defaults for these parameters are outdated. The default value for storage.buffercache.size is (JVM Max Memory / 4) and it's the same for storage.memorycomponent.globalbudget. Since your dataset is already loaded, you could reduce the budget of storage.memorycomponent.globalbudget. In addition, if I recall correctly, your dataset size is way smaller than what's allocated for the buffer cache, so you might want to reduce the buffer cache budget. That should give you more than enough memory to execute on 39 cores.

Cheers,
Murtadha


On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:

    + dev


    On 1/28/18 3:37 PM, Rana Alotaibi wrote:
    > Hi all,
    >
    > I would like to make AsterixDB utilizes all available CPU cores (39)
    > that I have for the following query:
    >
    > USE mimiciii;
    > SET `compiler.parallelism` "39";
    > SET `compiler.sortmemory` "128MB";
    > SET `compiler.joinmemory` "265MB";
    > SELECT P.SUBJECT_ID
    > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
    > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
    >              E.FLAG = 'abnormal' AND
    >              I.FLUID='Blood' AND
    >              I.LABEL='Haptoglobin'
    >
    >
    > The total memory size that I have is 125GB(57GB for the AsterixDB
    > buffer cache). By running the above query, I got the following error:
    >
    > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
    > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
    >
    > How can I change this capacity default configuration? I'm looking into
    > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
    > Could you please point me to the appropriate configuration parameter?
    >
    > Thanks
    > -- Rana
    >
    >
    >
    >



 

 


Re: Hyracks Job Requirement Configuration

Posted by Murtadha Hubail <hu...@gmail.com>.
If reloading the data isn’t too much trouble, the first thing I would do is recreate the instance with more partitions (e.g. partition per core or partition per 2 cores) and check the cores utilization. If this is the same dataset as the one in your previous email, you mentioned that it was about 10GB per partition, in that case, you might want to allocate at least 40GB for the buffer cache and you can reduce storage.memorycomponent.globalbudget to get enough memory to execute the job (depending on the number of partitions you create). After recreating with higher number of partitions, don’t use “SET `compiler.parallelism` "39"”. It will automatically use the number of partitions you create.

 

Regarding the metrics time, it includes the results printing time, so if you want to see if it has any impact, try adding “limit 1” at the end of your query or change it to select count(*) instead of subject_id.

 

Cheers,

Murtadha

 

From: Rana Alotaibi <ra...@eng.ucsd.edu>
Date: Monday, 29 January 2018 at 6:48 AM
To: <hu...@gmail.com>
Cc: <us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
Subject: Re: Hyracks Job Requirement Configuration

 

- Do you see all cores being fully utilized during the query execution? 

 I have noticed only 6 cores were utilized
- How much time does the query take right now and how do you measure the query execution time? Do you wait for the result to be printed somewhere (e.g. in the browser)?

I'm using the HTTP APIs. The response is a JSON object that includes the query execution time:

   { "status": "success",
        "metrics": {
                "elapsedTime": "434.627299814s",
                "executionTime": "434.626137977s",
                "resultCount": 4943,
                "resultSize": 132293,- 
                "processedObjects": 46875
        }
}

I run the query 10 times and took the average which is ~6mins.

- You mentioned that you have 4 partitions, how many physical hard drives are they mapped to?

 One physical hard drive

- Also, increasing the sort/join memory doesn’t necessarily lead to a better performance. Have you tried changing these values to something smaller and seeing the effects?

  Yes, I tried the following numbers:

  1) sort-memory: 32MB, join-memory: 64MB

  2) sort-memory: 64MB, join-memory: 128MB

  3) sort-memory: 128MB, join-memory:  265MB

 

The execution time remains on average ~6 - 6.5mins. I didn't see any improvement. The configurations that I have now:

- compiler.parallelism :39 //Only 6 were utilized 

- storage.buffercache.size: 20GB

- storage.buffercache.pagesize: 1MB

 

Thanks,

Rana

On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <hu...@gmail.com> wrote:

I have few questions if you don’t mind:

Do you see all cores being fully utilized during the query execution? 

How much time does the query take right now and how do you measure the query execution time? Do you wait for the result to be printed somewhere (e.g. in the browser)?

You mentioned that you have 4 partitions, how many physical hard drives are they mapped to?

Also, increasing the sort/join memory doesn’t necessarily lead to a better performance. Have you tried changing these values to something smaller and seeing the effects?

 

Cheers,

Murtadha

 

From: Rana Alotaibi <ra...@eng.ucsd.edu>
Date: Monday, 29 January 2018 at 5:21 AM
To: <hu...@gmail.com>
Cc: <us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
Subject: Re: Hyracks Job Requirement Configuration

 

Thanks Murtadha! The problem solved. However, increasing the number of cores didn't help to improve the performance of that query.

On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com> wrote:

Hi Rana,

The memory used for query processing is automatically calculated as follows:
JVM Max Memory - storage.buffercache.size - storage.memorycomponent.globalbudget

The documentation defaults for these parameters are outdated. The default value for storage.buffercache.size is (JVM Max Memory / 4) and it's the same for storage.memorycomponent.globalbudget. Since your dataset is already loaded, you could reduce the budget of storage.memorycomponent.globalbudget. In addition, if I recall correctly, your dataset size is way smaller than what's allocated for the buffer cache, so you might want to reduce the buffer cache budget. That should give you more than enough memory to execute on 39 cores.

Cheers,
Murtadha


On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:

    + dev


    On 1/28/18 3:37 PM, Rana Alotaibi wrote:
    > Hi all,
    >
    > I would like to make AsterixDB utilizes all available CPU cores (39)
    > that I have for the following query:
    >
    > USE mimiciii;
    > SET `compiler.parallelism` "39";
    > SET `compiler.sortmemory` "128MB";
    > SET `compiler.joinmemory` "265MB";
    > SELECT P.SUBJECT_ID
    > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
    > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
    >              E.FLAG = 'abnormal' AND
    >              I.FLUID='Blood' AND
    >              I.LABEL='Haptoglobin'
    >
    >
    > The total memory size that I have is 125GB(57GB for the AsterixDB
    > buffer cache). By running the above query, I got the following error:
    >
    > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
    > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
    >
    > How can I change this capacity default configuration? I'm looking into
    > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
    > Could you please point me to the appropriate configuration parameter?
    >
    > Thanks
    > -- Rana
    >
    >
    >
    >



 

 


Re: Hyracks Job Requirement Configuration

Posted by Rana Alotaibi <ra...@eng.ucsd.edu>.
*- Do you see all cores being fully utilized during the query execution? *
 I have noticed only 6 cores were utilized

*- How much time does the query take right now and how do you measure the
query execution time? Do you wait for the result to be printed somewhere
(e.g. in the browser)?*
I'm using the HTTP APIs. The response is a JSON object that includes the
query execution time:
   { "status": "success",
        "metrics": {

* "elapsedTime": "434.627299814s",                "executionTime":
"434.626137977s",*
                "resultCount": 4943,
                "resultSize": 132293,-
                "processedObjects": 46875
        }
}
I run the query 10 times and took the average which is ~6mins.

*- You mentioned that you have 4 partitions, how many physical hard drives
are they mapped to?*
 One physical hard drive
*- **Also, increasing the sort/join memory doesn’t necessarily lead to a
better performance. Have you tried changing these values to something
smaller and seeing the effects?*
  Yes, I tried the following numbers:
  1) sort-memory: 32MB, join-memory: 64MB
  2) sort-memory: 64MB, join-memory: 128MB
  3) sort-memory: 128MB, join-memory:  265MB

The execution time remains on average ~6 - 6.5mins. I didn't see any
improvement. The configurations that I have now:
- compiler.parallelism :39 //Only 6 were utilized
- storage.buffercache.size: 20GB
- storage.buffercache.pagesize: 1MB

Thanks,
Rana
On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <hu...@gmail.com>
wrote:

> I have few questions if you don’t mind:
>
> Do you see all cores being fully utilized during the query execution?
>
> How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?
>
> You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?
>
> Also, increasing the sort/join memory doesn’t necessarily lead to a better
> performance. Have you tried changing these values to something smaller and
> seeing the effects?
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 5:21 AM
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> Thanks Murtadha! The problem solved. However, increasing the number of
> cores didn't help to improve the performance of that query.
>
> On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com>
> wrote:
>
> Hi Rana,
>
> The memory used for query processing is automatically calculated as
> follows:
> JVM Max Memory - storage.buffercache.size - storage.memorycomponent.
> globalbudget
>
> The documentation defaults for these parameters are outdated. The default
> value for storage.buffercache.size is (JVM Max Memory / 4) and it's the
> same for storage.memorycomponent.globalbudget. Since your dataset is
> already loaded, you could reduce the budget of storage.memorycomponent.globalbudget.
> In addition, if I recall correctly, your dataset size is way smaller than
> what's allocated for the buffer cache, so you might want to reduce the
> buffer cache budget. That should give you more than enough memory to
> execute on 39 cores.
>
> Cheers,
> Murtadha
>
>
> On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:
>
>     + dev
>
>
>     On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>     > Hi all,
>     >
>     > I would like to make AsterixDB utilizes all available CPU cores (39)
>     > that I have for the following query:
>     >
>     > USE mimiciii;
>     > SET `compiler.parallelism` "39";
>     > SET `compiler.sortmemory` "128MB";
>     > SET `compiler.joinmemory` "265MB";
>     > SELECT P.SUBJECT_ID
>     > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>     > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>     >              E.FLAG = 'abnormal' AND
>     >              I.FLUID='Blood' AND
>     >              I.LABEL='Haptoglobin'
>     >
>     >
>     > The total memory size that I have is 125GB(57GB for the AsterixDB
>     > buffer cache). By running the above query, I got the following error:
>     >
>     > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
>     > cores: 39) exceeds capacity (memory: 3258744832 <(325)%20874-4832>
> bytes, CPU cores: 39)"
>     >
>     > How can I change this capacity default configuration? I'm looking
> into
>     > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
>     > Could you please point me to the appropriate configuration parameter?
>     >
>     > Thanks
>     > -- Rana
>     >
>     >
>     >
>     >
>
>
>
>
>

Re: Hyracks Job Requirement Configuration

Posted by Rana Alotaibi <ra...@eng.ucsd.edu>.
*- Do you see all cores being fully utilized during the query execution? *
 I have noticed only 6 cores were utilized

*- How much time does the query take right now and how do you measure the
query execution time? Do you wait for the result to be printed somewhere
(e.g. in the browser)?*
I'm using the HTTP APIs. The response is a JSON object that includes the
query execution time:
   { "status": "success",
        "metrics": {

* "elapsedTime": "434.627299814s",                "executionTime":
"434.626137977s",*
                "resultCount": 4943,
                "resultSize": 132293,-
                "processedObjects": 46875
        }
}
I run the query 10 times and took the average which is ~6mins.

*- You mentioned that you have 4 partitions, how many physical hard drives
are they mapped to?*
 One physical hard drive
*- **Also, increasing the sort/join memory doesn’t necessarily lead to a
better performance. Have you tried changing these values to something
smaller and seeing the effects?*
  Yes, I tried the following numbers:
  1) sort-memory: 32MB, join-memory: 64MB
  2) sort-memory: 64MB, join-memory: 128MB
  3) sort-memory: 128MB, join-memory:  265MB

The execution time remains on average ~6 - 6.5mins. I didn't see any
improvement. The configurations that I have now:
- compiler.parallelism :39 //Only 6 were utilized
- storage.buffercache.size: 20GB
- storage.buffercache.pagesize: 1MB

Thanks,
Rana
On Sun, Jan 28, 2018 at 6:41 PM, Murtadha Hubail <hu...@gmail.com>
wrote:

> I have few questions if you don’t mind:
>
> Do you see all cores being fully utilized during the query execution?
>
> How much time does the query take right now and how do you measure the
> query execution time? Do you wait for the result to be printed somewhere
> (e.g. in the browser)?
>
> You mentioned that you have 4 partitions, how many physical hard drives
> are they mapped to?
>
> Also, increasing the sort/join memory doesn’t necessarily lead to a better
> performance. Have you tried changing these values to something smaller and
> seeing the effects?
>
>
>
> Cheers,
>
> Murtadha
>
>
>
> *From: *Rana Alotaibi <ra...@eng.ucsd.edu>
> *Date: *Monday, 29 January 2018 at 5:21 AM
> *To: *<hu...@gmail.com>
> *Cc: *<us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
> *Subject: *Re: Hyracks Job Requirement Configuration
>
>
>
> Thanks Murtadha! The problem solved. However, increasing the number of
> cores didn't help to improve the performance of that query.
>
> On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com>
> wrote:
>
> Hi Rana,
>
> The memory used for query processing is automatically calculated as
> follows:
> JVM Max Memory - storage.buffercache.size - storage.memorycomponent.
> globalbudget
>
> The documentation defaults for these parameters are outdated. The default
> value for storage.buffercache.size is (JVM Max Memory / 4) and it's the
> same for storage.memorycomponent.globalbudget. Since your dataset is
> already loaded, you could reduce the budget of storage.memorycomponent.globalbudget.
> In addition, if I recall correctly, your dataset size is way smaller than
> what's allocated for the buffer cache, so you might want to reduce the
> buffer cache budget. That should give you more than enough memory to
> execute on 39 cores.
>
> Cheers,
> Murtadha
>
>
> On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:
>
>     + dev
>
>
>     On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>     > Hi all,
>     >
>     > I would like to make AsterixDB utilizes all available CPU cores (39)
>     > that I have for the following query:
>     >
>     > USE mimiciii;
>     > SET `compiler.parallelism` "39";
>     > SET `compiler.sortmemory` "128MB";
>     > SET `compiler.joinmemory` "265MB";
>     > SELECT P.SUBJECT_ID
>     > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>     > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>     >              E.FLAG = 'abnormal' AND
>     >              I.FLUID='Blood' AND
>     >              I.LABEL='Haptoglobin'
>     >
>     >
>     > The total memory size that I have is 125GB(57GB for the AsterixDB
>     > buffer cache). By running the above query, I got the following error:
>     >
>     > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
>     > cores: 39) exceeds capacity (memory: 3258744832 <(325)%20874-4832>
> bytes, CPU cores: 39)"
>     >
>     > How can I change this capacity default configuration? I'm looking
> into
>     > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
>     > Could you please point me to the appropriate configuration parameter?
>     >
>     > Thanks
>     > -- Rana
>     >
>     >
>     >
>     >
>
>
>
>
>

Re: Hyracks Job Requirement Configuration

Posted by Murtadha Hubail <hu...@gmail.com>.
I have few questions if you don’t mind:

Do you see all cores being fully utilized during the query execution? 

How much time does the query take right now and how do you measure the query execution time? Do you wait for the result to be printed somewhere (e.g. in the browser)?

You mentioned that you have 4 partitions, how many physical hard drives are they mapped to?

Also, increasing the sort/join memory doesn’t necessarily lead to a better performance. Have you tried changing these values to something smaller and seeing the effects?

 

Cheers,

Murtadha

 

From: Rana Alotaibi <ra...@eng.ucsd.edu>
Date: Monday, 29 January 2018 at 5:21 AM
To: <hu...@gmail.com>
Cc: <us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
Subject: Re: Hyracks Job Requirement Configuration

 

Thanks Murtadha! The problem solved. However, increasing the number of cores didn't help to improve the performance of that query.

On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com> wrote:

Hi Rana,

The memory used for query processing is automatically calculated as follows:
JVM Max Memory - storage.buffercache.size - storage.memorycomponent.globalbudget

The documentation defaults for these parameters are outdated. The default value for storage.buffercache.size is (JVM Max Memory / 4) and it's the same for storage.memorycomponent.globalbudget. Since your dataset is already loaded, you could reduce the budget of storage.memorycomponent.globalbudget. In addition, if I recall correctly, your dataset size is way smaller than what's allocated for the buffer cache, so you might want to reduce the buffer cache budget. That should give you more than enough memory to execute on 39 cores.

Cheers,
Murtadha


On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:

    + dev


    On 1/28/18 3:37 PM, Rana Alotaibi wrote:
    > Hi all,
    >
    > I would like to make AsterixDB utilizes all available CPU cores (39)
    > that I have for the following query:
    >
    > USE mimiciii;
    > SET `compiler.parallelism` "39";
    > SET `compiler.sortmemory` "128MB";
    > SET `compiler.joinmemory` "265MB";
    > SELECT P.SUBJECT_ID
    > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
    > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
    >              E.FLAG = 'abnormal' AND
    >              I.FLUID='Blood' AND
    >              I.LABEL='Haptoglobin'
    >
    >
    > The total memory size that I have is 125GB(57GB for the AsterixDB
    > buffer cache). By running the above query, I got the following error:
    >
    > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
    > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
    >
    > How can I change this capacity default configuration? I'm looking into
    > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
    > Could you please point me to the appropriate configuration parameter?
    >
    > Thanks
    > -- Rana
    >
    >
    >
    >




 


Re: Hyracks Job Requirement Configuration

Posted by Murtadha Hubail <hu...@gmail.com>.
I have few questions if you don’t mind:

Do you see all cores being fully utilized during the query execution? 

How much time does the query take right now and how do you measure the query execution time? Do you wait for the result to be printed somewhere (e.g. in the browser)?

You mentioned that you have 4 partitions, how many physical hard drives are they mapped to?

Also, increasing the sort/join memory doesn’t necessarily lead to a better performance. Have you tried changing these values to something smaller and seeing the effects?

 

Cheers,

Murtadha

 

From: Rana Alotaibi <ra...@eng.ucsd.edu>
Date: Monday, 29 January 2018 at 5:21 AM
To: <hu...@gmail.com>
Cc: <us...@asterixdb.apache.org>, <de...@asterixdb.apache.org>
Subject: Re: Hyracks Job Requirement Configuration

 

Thanks Murtadha! The problem solved. However, increasing the number of cores didn't help to improve the performance of that query.

On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com> wrote:

Hi Rana,

The memory used for query processing is automatically calculated as follows:
JVM Max Memory - storage.buffercache.size - storage.memorycomponent.globalbudget

The documentation defaults for these parameters are outdated. The default value for storage.buffercache.size is (JVM Max Memory / 4) and it's the same for storage.memorycomponent.globalbudget. Since your dataset is already loaded, you could reduce the budget of storage.memorycomponent.globalbudget. In addition, if I recall correctly, your dataset size is way smaller than what's allocated for the buffer cache, so you might want to reduce the buffer cache budget. That should give you more than enough memory to execute on 39 cores.

Cheers,
Murtadha


On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:

    + dev


    On 1/28/18 3:37 PM, Rana Alotaibi wrote:
    > Hi all,
    >
    > I would like to make AsterixDB utilizes all available CPU cores (39)
    > that I have for the following query:
    >
    > USE mimiciii;
    > SET `compiler.parallelism` "39";
    > SET `compiler.sortmemory` "128MB";
    > SET `compiler.joinmemory` "265MB";
    > SELECT P.SUBJECT_ID
    > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
    > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
    >              E.FLAG = 'abnormal' AND
    >              I.FLUID='Blood' AND
    >              I.LABEL='Haptoglobin'
    >
    >
    > The total memory size that I have is 125GB(57GB for the AsterixDB
    > buffer cache). By running the above query, I got the following error:
    >
    > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
    > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
    >
    > How can I change this capacity default configuration? I'm looking into
    > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
    > Could you please point me to the appropriate configuration parameter?
    >
    > Thanks
    > -- Rana
    >
    >
    >
    >




 


Re: Hyracks Job Requirement Configuration

Posted by Rana Alotaibi <ra...@eng.ucsd.edu>.
Thanks Murtadha! The problem solved. However, increasing the number of
cores didn't help to improve the performance of that query.

On Sun, Jan 28, 2018 at 5:05 PM, Murtadha Hubail <hu...@gmail.com>
wrote:

> Hi Rana,
>
> The memory used for query processing is automatically calculated as
> follows:
> JVM Max Memory - storage.buffercache.size - storage.memorycomponent.
> globalbudget
>
> The documentation defaults for these parameters are outdated. The default
> value for storage.buffercache.size is (JVM Max Memory / 4) and it's the
> same for storage.memorycomponent.globalbudget. Since your dataset is
> already loaded, you could reduce the budget of storage.memorycomponent.globalbudget.
> In addition, if I recall correctly, your dataset size is way smaller than
> what's allocated for the buffer cache, so you might want to reduce the
> buffer cache budget. That should give you more than enough memory to
> execute on 39 cores.
>
> Cheers,
> Murtadha
>
> On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:
>
>     + dev
>
>
>     On 1/28/18 3:37 PM, Rana Alotaibi wrote:
>     > Hi all,
>     >
>     > I would like to make AsterixDB utilizes all available CPU cores (39)
>     > that I have for the following query:
>     >
>     > USE mimiciii;
>     > SET `compiler.parallelism` "39";
>     > SET `compiler.sortmemory` "128MB";
>     > SET `compiler.joinmemory` "265MB";
>     > SELECT P.SUBJECT_ID
>     > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
>     > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
>     >              E.FLAG = 'abnormal' AND
>     >              I.FLUID='Blood' AND
>     >              I.LABEL='Haptoglobin'
>     >
>     >
>     > The total memory size that I have is 125GB(57GB for the AsterixDB
>     > buffer cache). By running the above query, I got the following error:
>     >
>     > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU
>     > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores:
> 39)"
>     >
>     > How can I change this capacity default configuration? I'm looking
> into
>     > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html .
>     > Could you please point me to the appropriate configuration parameter?
>     >
>     > Thanks
>     > -- Rana
>     >
>     >
>     >
>     >
>
>
>
>
>

Re: Hyracks Job Requirement Configuration

Posted by Murtadha Hubail <hu...@gmail.com>.
Hi Rana,

The memory used for query processing is automatically calculated as follows:
JVM Max Memory - storage.buffercache.size - storage.memorycomponent.globalbudget

The documentation defaults for these parameters are outdated. The default value for storage.buffercache.size is (JVM Max Memory / 4) and it's the same for storage.memorycomponent.globalbudget. Since your dataset is already loaded, you could reduce the budget of storage.memorycomponent.globalbudget. In addition, if I recall correctly, your dataset size is way smaller than what's allocated for the buffer cache, so you might want to reduce the buffer cache budget. That should give you more than enough memory to execute on 39 cores.

Cheers,
Murtadha

On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:

    + dev
    
    
    On 1/28/18 3:37 PM, Rana Alotaibi wrote:
    > Hi all,
    >
    > I would like to make AsterixDB utilizes all available CPU cores (39) 
    > that I have for the following query:
    >
    > USE mimiciii;
    > SET `compiler.parallelism` "39";
    > SET `compiler.sortmemory` "128MB";
    > SET `compiler.joinmemory` "265MB";
    > SELECT P.SUBJECT_ID
    > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
    > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
    >              E.FLAG = 'abnormal' AND
    >              I.FLUID='Blood' AND
    >              I.LABEL='Haptoglobin'
    >
    >
    > The total memory size that I have is 125GB(57GB for the AsterixDB 
    > buffer cache). By running the above query, I got the following error:
    >
    > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU 
    > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
    >
    > How can I change this capacity default configuration? I'm looking into 
    > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html . 
    > Could you please point me to the appropriate configuration parameter?
    >
    > Thanks
    > -- Rana
    >
    >
    >
    >
    
    



Re: Hyracks Job Requirement Configuration

Posted by Murtadha Hubail <hu...@gmail.com>.
Hi Rana,

The memory used for query processing is automatically calculated as follows:
JVM Max Memory - storage.buffercache.size - storage.memorycomponent.globalbudget

The documentation defaults for these parameters are outdated. The default value for storage.buffercache.size is (JVM Max Memory / 4) and it's the same for storage.memorycomponent.globalbudget. Since your dataset is already loaded, you could reduce the budget of storage.memorycomponent.globalbudget. In addition, if I recall correctly, your dataset size is way smaller than what's allocated for the buffer cache, so you might want to reduce the buffer cache budget. That should give you more than enough memory to execute on 39 cores.

Cheers,
Murtadha

On 01/29/2018, 3:30 AM, "Mike Carey" <dt...@gmail.com> wrote:

    + dev
    
    
    On 1/28/18 3:37 PM, Rana Alotaibi wrote:
    > Hi all,
    >
    > I would like to make AsterixDB utilizes all available CPU cores (39) 
    > that I have for the following query:
    >
    > USE mimiciii;
    > SET `compiler.parallelism` "39";
    > SET `compiler.sortmemory` "128MB";
    > SET `compiler.joinmemory` "265MB";
    > SELECT P.SUBJECT_ID
    > FROM   LABITEMS I, PATIENTS P, P.ADMISSIONS A, A.LABEVENTS E
    > WHERE E.ITEMID/*+bcast*/=I.ITEMID AND
    >              E.FLAG = 'abnormal' AND
    >              I.FLUID='Blood' AND
    >              I.LABEL='Haptoglobin'
    >
    >
    > The total memory size that I have is 125GB(57GB for the AsterixDB 
    > buffer cache). By running the above query, I got the following error:
    >
    > "msg": "HYR0009: Job requirement (memory: 10705403904 bytes, CPU 
    > cores: 39) exceeds capacity (memory: 3258744832 bytes, CPU cores: 39)"
    >
    > How can I change this capacity default configuration? I'm looking into 
    > this page : https://asterixdb.apache.org/docs/0.9.2/ncservice.html . 
    > Could you please point me to the appropriate configuration parameter?
    >
    > Thanks
    > -- Rana
    >
    >
    >
    >