You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@asterixdb.apache.org by Muhammad Abu Bakar Siddique <ms...@ucr.edu> on 2018/02/20 22:53:09 UTC

How to combine two Maps into one?

Hi,
I am trying to code a very simple example that can compute a single
histogram from two different files. I am able to compute separate
histograms to for each file using  OneToOneConnectorDescriptor. Now, I want
to combine these two maps into one map. I could not find any
MToOneConnector, where I can combine these two maps into one. Can somebody
please guide me how to do in a correct way?
What I did:
1. Created two splits for input files
2. Connected input to myOperatorDescriptor using OneToOneConnectorDescriptor
3. Connected myOperatorDescriptor to the output using
OneToOneConnectorDescriptor
4. myOperatorDescriptor is reading the files and computing the histogram
(in HashMap) for each file
What I need to do:
1. Combine the maps into one.

Re: How to combine two Maps into one?

Posted by Mike Carey <dt...@gmail.com>.

Ah - you guys are way too self-energetic - you need to cheat sometimes!  
If you were to create a dataset in AsterixDB and run exactly that query 
on it, you can view the optimized query plan and the Hyracks job (and 
its location constraints).  That way you can (for future reference) see 
what connectors, operators, etc., are used there and how.  (All the 
machinery for what you are doing is definitely there for you, as 
AsterixDB does exactly that when processing that query. :-))

Cheers,

Mike


On 2/21/18 2:00 PM, Ahmed Eldawy wrote:
> Here's some context about this problem. We are trying to build a simple
> GroupBy-Aggregate function using Hyracks. Think about the following SQL
> query
> SELECT id, COUNT(id) FROM dataset GROUP BY id;
> Our design has two operators, local aggregator and global aggregator.
> The local aggregator processes one input split at a time and computes its
> histogram in the form of <ID, count>.
> The global aggregator combines all the pairs <ID, count> produced by the
> local aggregators to produce the final output as <ID, Sum(count)>
> The question is which type of connector we should use to connect the local
> aggregator to the global aggregator? While we know that an MtoN hash
> connector will work, where each machine combines a subset of the keys, our
> design is to combine all of them in a single machine. In other words, there
> has to be only one instance of the global aggregator running on a single
> machine.
>
>
> On Tue, Feb 20, 2018 at 2:53 PM, Muhammad Abu Bakar Siddique <
> msidd005@ucr.edu> wrote:
>
>> Hi,
>> I am trying to code a very simple example that can compute a single
>> histogram from two different files. I am able to compute separate
>> histograms to for each file using  OneToOneConnectorDescriptor. Now, I want
>> to combine these two maps into one map. I could not find any
>> MToOneConnector, where I can combine these two maps into one. Can somebody
>> please guide me how to do in a correct way?
>> What I did:
>> 1. Created two splits for input files
>> 2. Connected input to myOperatorDescriptor using
>> OneToOneConnectorDescriptor
>> 3. Connected myOperatorDescriptor to the output using
>> OneToOneConnectorDescriptor
>> 4. myOperatorDescriptor is reading the files and computing the histogram
>> (in HashMap) for each file
>> What I need to do:
>> 1. Combine the maps into one.
>>
>
>

Re: How to combine two Maps into one?

Posted by abdullah alamoudi <ba...@gmail.com>.

Hi Ahmed, Muhammad,

An MToNPartitioningMergingConnectorDescriptor would work. What you need to do is to set the location constraint of the global aggregator to have a location constraint with a cardinality of 1.
Everything else should just work.

Cheers,
Abdullah.

> On Feb 21, 2018, at 2:00 PM, Ahmed Eldawy <el...@ucr.edu> wrote:
> 
> Here's some context about this problem. We are trying to build a simple
> GroupBy-Aggregate function using Hyracks. Think about the following SQL
> query
> SELECT id, COUNT(id) FROM dataset GROUP BY id;
> Our design has two operators, local aggregator and global aggregator.
> The local aggregator processes one input split at a time and computes its
> histogram in the form of <ID, count>.
> The global aggregator combines all the pairs <ID, count> produced by the
> local aggregators to produce the final output as <ID, Sum(count)>
> The question is which type of connector we should use to connect the local
> aggregator to the global aggregator? While we know that an MtoN hash
> connector will work, where each machine combines a subset of the keys, our
> design is to combine all of them in a single machine. In other words, there
> has to be only one instance of the global aggregator running on a single
> machine.
> 
> 
> On Tue, Feb 20, 2018 at 2:53 PM, Muhammad Abu Bakar Siddique <
> msidd005@ucr.edu> wrote:
> 
>> Hi,
>> I am trying to code a very simple example that can compute a single
>> histogram from two different files. I am able to compute separate
>> histograms to for each file using  OneToOneConnectorDescriptor. Now, I want
>> to combine these two maps into one map. I could not find any
>> MToOneConnector, where I can combine these two maps into one. Can somebody
>> please guide me how to do in a correct way?
>> What I did:
>> 1. Created two splits for input files
>> 2. Connected input to myOperatorDescriptor using
>> OneToOneConnectorDescriptor
>> 3. Connected myOperatorDescriptor to the output using
>> OneToOneConnectorDescriptor
>> 4. myOperatorDescriptor is reading the files and computing the histogram
>> (in HashMap) for each file
>> What I need to do:
>> 1. Combine the maps into one.
>> 
> 
> 
> 
> -- 
> 
> Ahmed Eldawy
> Assistant Professor
> http://www.cs.ucr.edu/~eldawy
> Tel: +1 (951) 827-5654

Re: How to combine two Maps into one?

Posted by Ahmed Eldawy <el...@ucr.edu>.

Here's some context about this problem. We are trying to build a simple
GroupBy-Aggregate function using Hyracks. Think about the following SQL
query
SELECT id, COUNT(id) FROM dataset GROUP BY id;
Our design has two operators, local aggregator and global aggregator.
The local aggregator processes one input split at a time and computes its
histogram in the form of <ID, count>.
The global aggregator combines all the pairs <ID, count> produced by the
local aggregators to produce the final output as <ID, Sum(count)>
The question is which type of connector we should use to connect the local
aggregator to the global aggregator? While we know that an MtoN hash
connector will work, where each machine combines a subset of the keys, our
design is to combine all of them in a single machine. In other words, there
has to be only one instance of the global aggregator running on a single
machine.

On Tue, Feb 20, 2018 at 2:53 PM, Muhammad Abu Bakar Siddique <
msidd005@ucr.edu> wrote:

> Hi,
> I am trying to code a very simple example that can compute a single
> histogram from two different files. I am able to compute separate
> histograms to for each file using  OneToOneConnectorDescriptor. Now, I want
> to combine these two maps into one map. I could not find any
> MToOneConnector, where I can combine these two maps into one. Can somebody
> please guide me how to do in a correct way?
> What I did:
> 1. Created two splits for input files
> 2. Connected input to myOperatorDescriptor using
> OneToOneConnectorDescriptor
> 3. Connected myOperatorDescriptor to the output using
> OneToOneConnectorDescriptor
> 4. myOperatorDescriptor is reading the files and computing the histogram
> (in HashMap) for each file
> What I need to do:
> 1. Combine the maps into one.
>

-- 

Ahmed Eldawy
Assistant Professor
http://www.cs.ucr.edu/~eldawy
Tel: +1 (951) 827-5654