You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Evgenii Zhuravlev <e....@gmail.com> on 2018/01/16 14:28:32 UTC

Re: Questions regarding Ignite as hdfs cache

Hi,

1) I think you can run Ignite in the non-quiet mode with flag '-v' or even
with DEBUG logs, it will definitely show information if it used. Otherwise,
you can make some mistake in IGFS configuration and check if it will work.
If you will face exceptions, this means that Ignite worked.

2,3,4) I'm not sure how large is your dataset, but I would recommend to
IGFS more memory - all data should be placed in IGFS, otherwise, it could
lead to a lot of data moving from HDFS to IGFS, which, obviously, may
affect performance.
Also, regarding query you've shared - it looks quite strange to me, you
joining 6 tables while taking only 3 fields from them. Are you sure that
you use optimal DB structure?

5) Ignite uses its own serialization algorithm, you can read about it here:
https://apacheignite.readme.io/docs/binary-marshaller

Evgenii

2017-11-02 9:49 GMT+03:00 shailesh prajapati <sh...@gmail.com>:

> Hello,
>
> I am evaluating Ignite to be able to use it as a hdfs cache to speedup my
> hive queries. I am using hive with tez. Below are my cluster and Ignite
> configurations,
>
> *Cluster: *
> 4 data nodes with 32gb RAM each, 1 edge node
> 4 ignite servers, one for each data node. Ignite servers were started with
> Xmx10g
>
> *Setup done using:*
> https://apacheignite-fs.readme.io/docs/installing-on-hortonworks-hdp
> https://apacheignite-fs.readme.io/docs/running-apache-
> hive-over-ignited-hadoop
>
> *Ignite configuration file (provided to each ignite server): *
> <bean id="grid.cfg" class="org.apache.ignite.configuration.
> IgniteConfiguration">
> <property name="memoryConfiguration">
> <bean class="org.apache.ignite.configuration.MemoryConfiguration">
>     <property name="defaultMemoryPolicySize" value="#{8L * 1024 * 1024 *
> 1024}"/>
> </bean>
> </property>
> <property name="connectorConfiguration">
>     <bean class="org.apache.ignite.configuration.ConnectorConfiguration">
>         <property name="port" value="11211"/>
>     </bean>
> </property>
> <property name="fileSystemConfiguration">
>     <list>
>         <bean class="org.apache.ignite.configuration.
> FileSystemConfiguration">
>             <!-- IGFS name you will use to access IGFS through Hadoop API.
> -->
>             <property name="name" value="igfs"/>
>
>             <!-- Configure TCP endpoint for communication with the file
> system instance. -->
>             <property name="ipcEndpointConfiguration">
>                 <bean class="org.apache.ignite.igfs.
> IgfsIpcEndpointConfiguration">
>                     <property name="type" value="TCP" />
>                     <property name="host" value="0.0.0.0" />
>                     <property name="port" value="10500" />
>                 </bean>
>             </property>
>
>             <!--
>                 Configure secondary file system if needed.
>             -->
>
>             <property name="secondaryFileSystem">
>                 <bean class="org.apache.ignite.hadoop.fs.
> IgniteHadoopIgfsSecondaryFileSystem">
>                     <property name="fileSystemFactory">
>                         <bean class="org.apache.ignite.hadoop.fs.
> CachingHadoopFileSystemFactory">
>                             <property name="uri"
> value="hdfs://<hostip>:8020/"/>
>                         </bean>
>                     </property>
>                 </bean>
>             </property>
>
>         </bean>
>     </list>
> </property>
> <property name="discoverySpi">
>     <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
>         <property name="ipFinder">
>             <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.
> TcpDiscoveryVmIpFinder">
>                 <property name="addresses">
>                     <list>
>                         <value>node1:47500..47509</value>
>         <value>node2:47500..47509</value>
>          <value>node3:47500..47509</value>
>          <value>node4:47500..47509</value>
>                     </list>
>                 </property>
>             </bean>
>         </property>
>     </bean>
> </property>
> </bean>
>
> *Dataset used for the experiment: *
> TPCH
> customer 1500000 rows
> lineitem 59986052 rows
> nation 25 rows
> orders 15000000 rows
> part 2000000 rows
> partsupp 8000000 rows
> region 5 rows
> supplier 100000 rows
>
> and using standard TPCH queries
>
> *Querying from hive shell with below properties:*
> set fs.default.name=igfs://igfs@node1:10500/;
>
>
>
> I have now following questions:
>
> 1) My queries are running fine with the above configurations. I want to
> see whether the data is caching and coming from cache or not. How should i
> check this? I used Ignite visor to see if the data is available in cache,
> but i did not find any cache there.
>
> Although, in the Ignite server logs, i can see messages for local node
> metrics like shown below. The Heap usage is continuously increases while
> running query. what does this means?
>
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=e38943b2, name=null, uptime=03:02:18:866]
>     ^-- H/N/C [hosts=4, nodes=4, CPUs=32]
>     ^-- CPU [cur=0.23%, avg=0.13%, GC=0%]
>     ^-- PageMemory [pages=7381]
>     ^-- Heap [used=1050MB, free=88.46%, comm=3343MB]
>     ^-- Non heap [used=83MB, free=98.45%, comm=84MB]
>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=6, qSize=0]
>     ^-- Outbound messages queue [size=0]
>
>
> 2) I ran queries on both hive+tez+hdfs and hive+tez+ignite+hdfs. I found
> that the queries are slower when using ignite as a cache layer. For example
> consider below TPCH standard query,
>
> select
> n_name,
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> customer,
> orders,
> lineitem,
> supplier,
> nation,
> region
> where
> c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and l_suppkey = s_suppkey
> and c_nationkey = s_nationkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = 'AFRICA'
> and o_orderdate >= '1993-01-01'
> and o_orderdate < '1994-01-01'
> group by
> n_name
> order by
> revenue desc;
>
> Hive+tez avg time: 35.542s
> Hive+tez+ignite avg time: 38.221s
>
> Am i using wrong configurations?
>
> 3) I tried running queries with ignite MR with below configs set in hive.
> set hive.rpc.query.plan = true;
> set hive.execution.engine = mr;
> set mapreduce.framework.name = ignite;
> set mapreduce.jobtracker.address = node1:11211;
>
> The queries were even slower than hive+tez+ignite. Is there any other
> configuration for Ignite MR that i need to do?
>
> 4) Are my configurations optimal? if not can you please suggest me one.
>
> 5) What serialization algo (kryo, native java ...) Ignite uses?
>
> Thanks
>
>
>
>