You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Enas Alkawasmi <ea...@uncc.edu> on 2019/03/28 04:06:26 UTC

Combining output of multiple filters/iterators

I have the following problem model:
I have a graph stored in accumulo and I want to design iterators that
retrieve all nodes siblings of a given node. 
I am thinking of a nested iterators(filters) that filter out the graph table
based on the node_ID and then each filter pass the resulted nodes to the
iterator comes after it as condition. I need to accumulate all the extracted
nodes and return them back to the client. I mean I need to have all the
processing to be done on the server side. My question is: how can make one
filter gets its options from the iterator executed before it. Also I want
each filter to be applied on the same original data-set but the outputs of
the iterators united after they all done. In other words I need the
iterators to call each others in nested way and they pass options from their
initial output to be used in the next iterator process but the result of
each iterator I need to keep it a side without affecting the original table
then I combine them at the end. 
I read thoroughly in the map reduce examples with no luck. and I also read
about iterators and filers and I came to know that i cannot directly control
their execution but I can use priorities to control them. I still have the
challenge of interdependence between options. I need help in coding that in
java if some one can guide me to achieve what mentioned here:
" So it means if I set an Iterator and creates a buffer in memory with in
the iterator it will be created on each tablet server, right? This is the
map like function but how can I return and combine all the buffers client
side (the reduce)? Does Iterator has some functionality to make this process
easy? Further it would be a great help if you can provide some sample code
for the same or you have some similar implementation using iterators or
MapReduce"
I quoted from: 
https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html

Re: Combining output of multiple filters/iterators

Posted by Enas Alkawasmi <ea...@uncc.edu>.

What do you mean by 
Christopher Tubbs-2 wrote
> single iterator as a composition
> of other, smaller components.

what type of components are those?
can you provide template code structure that I can follow in my code?. Do
you think mapreduce can fit to my problem?



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html

Re: Combining output of multiple filters/iterators

Posted by Christopher <ct...@apache.org>.

You could set up your iterators to communicate with each other when
initialized via the "source" parameter to the init() method.

However, because your iterators seem to be so dependent upon one
another, it might be better for you to implement this as a single
iterator. But, you can implement this single iterator as a composition
of other, smaller components, if necessary. A single iterator would
probably perform better anyway.

On Thu, Mar 28, 2019 at 9:44 AM Enas Alkawasmi <ea...@uncc.edu> wrote:
>
> I have the following problem model:
> I have a graph stored in accumulo and I want to design iterators that
> retrieve all nodes siblings of a given node.
> I am thinking of a nested iterators(filters) that filter out the graph table
> based on the node_ID and then each filter pass the resulted nodes to the
> iterator comes after it as condition. I need to accumulate all the extracted
> nodes and return them back to the client. I mean I need to have all the
> processing to be done on the server side. My question is: how can make one
> filter gets its options from the iterator executed before it. Also I want
> each filter to be applied on the same original data-set but the outputs of
> the iterators united after they all done. In other words I need the
> iterators to call each others in nested way and they pass options from their
> initial output to be used in the next iterator process but the result of
> each iterator I need to keep it a side without affecting the original table
> then I combine them at the end.
> I read thoroughly in the map reduce examples with no luck. and I also read
> about iterators and filers and I came to know that i cannot directly control
> their execution but I can use priorities to control them. I still have the
> challenge of interdependence between options. I need help in coding that in
> java if some one can guide me to achieve what mentioned here:
> " So it means if I set an Iterator and creates a buffer in memory with in
> the iterator it will be created on each tablet server, right? This is the
> map like function but how can I return and combine all the buffers client
> side (the reduce)? Does Iterator has some functionality to make this process
> easy? Further it would be a great help if you can provide some sample code
> for the same or you have some similar implementation using iterators or
> MapReduce"
> I quoted from:
> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html

Re: Combining output of multiple filters/iterators

Posted by Josh Elser <el...@apache.org>.

You cannot feasibly hold onto some intermediate batch of nodes in 
memory. You're invalidating the general premise of how Accumulo 
iterators are meant to work in doing this. Further, an Iterator can 
_only_ safely operate within one row of a table. Two adjacent rows may 
be located on two different physical machines.

Would suggest you read through this presentation and try to take some 
time to understand why they did it this way: 
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf. 
You might also be able to take something from Shana Hutchison's work on 
Graphulo: https://arxiv.org/abs/1606.07085

On 3/29/19 2:20 PM, Enas Alkawasmi wrote:
> Thank you for this suggestion. i have one question, c I pass options to the
> new source that are from the result of the current iterator? . the new
> iterator need to get the parent nodes from the the current one how can
> enforce the iterator to wait for the result form its preceding iterator?
> 
> 
> 
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html
>

Re: Combining output of multiple filters/iterators

Posted by Enas Alkawasmi <ea...@uncc.edu>.

Thank you for this suggestion. i have one question, c I pass options to the
new source that are from the result of the current iterator? . the new
iterator need to get the parent nodes from the the current one how can
enforce the iterator to wait for the result form its preceding iterator?



--
Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html

Re: Combining output of multiple filters/iterators

Posted by Keith Turner <ke...@deenlo.com>.

You may be able to use the deepCopy() method.  This allows an iterator
to create multiple copies of its source.  Then it can seek each copy
separately.   Deep copies should be created in the init method.  The
following is an example of this.

https://github.com/apache/accumulo/blob/rel/1.9.2/core/src/main/java/org/apache/accumulo/core/iterators/user/IntersectingIterator.java#L502

On Thu, Mar 28, 2019 at 9:44 AM Enas Alkawasmi <ea...@uncc.edu> wrote:
>
> I have the following problem model:
> I have a graph stored in accumulo and I want to design iterators that
> retrieve all nodes siblings of a given node.
> I am thinking of a nested iterators(filters) that filter out the graph table
> based on the node_ID and then each filter pass the resulted nodes to the
> iterator comes after it as condition. I need to accumulate all the extracted
> nodes and return them back to the client. I mean I need to have all the
> processing to be done on the server side. My question is: how can make one
> filter gets its options from the iterator executed before it. Also I want
> each filter to be applied on the same original data-set but the outputs of
> the iterators united after they all done. In other words I need the
> iterators to call each others in nested way and they pass options from their
> initial output to be used in the next iterator process but the result of
> each iterator I need to keep it a side without affecting the original table
> then I combine them at the end.
> I read thoroughly in the map reduce examples with no luck. and I also read
> about iterators and filers and I came to know that i cannot directly control
> their execution but I can use priorities to control them. I still have the
> challenge of interdependence between options. I need help in coding that in
> java if some one can guide me to achieve what mentioned here:
> " So it means if I set an Iterator and creates a buffer in memory with in
> the iterator it will be created on each tablet server, right? This is the
> map like function but how can I return and combine all the buffers client
> side (the reduce)? Does Iterator has some functionality to make this process
> easy? Further it would be a great help if you can provide some sample code
> for the same or you have some similar implementation using iterators or
> MapReduce"
> I quoted from:
> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Developers-f3.html