You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2016/05/02 14:39:12 UTC

[jira] [Updated] (ATLAS-690) Read timed out exceptions when tables are imported into Atlas.

     [ https://issues.apache.org/jira/browse/ATLAS-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated ATLAS-690:
-----------------------------------
    Attachment: ATLAS-690-3.patch

This patch fixes two things, mainly:

* For user defined type attributes which are unique, this adds a composite index that combines the attribute with state. This is to fix the regression where because we now look up unique attributes along with state, without the index, they become slow.
* When adding / updating an entity, we create a property that aggregates the values of all attributes and the values of attributes of the first level references of the entity so that a full text index property can be added. As mentioned in the last comment, with a recent fix, for a hive column we added a back reference to the hive table. This caused every hive column to load every hive table vertex along with all its columns again, creating a quadratic effect on times that we observed. To fix this, I have attempted to cache the GUID of an entity against the reference created by the first load. This cache is only applicable for that request and does not persist across (to avoid any memory issues).

With these two fixes, the times taken for a 1000 table load in my local setup is taking almost the same time as before the regressions.

However, this patch is still not for submission. In particular, there are a couple of tests failing. Also, I request [~ssainath] to run with this patch on her environment to make sure we are seeing the improvements at scale. I also need to enhance tests for code I've written. Will update with a new patch once these are done. Still, would appreciate if someone can look at the fix and provide early feedback.

> Read timed out exceptions when tables are imported into Atlas.
> --------------------------------------------------------------
>
>                 Key: ATLAS-690
>                 URL: https://issues.apache.org/jira/browse/ATLAS-690
>             Project: Atlas
>          Issue Type: Bug
>         Environment: Atlas with External Kafka/  HBase / Solr
> atlas.notification.hook.numthreads=5
> ATLAS_HOOK created with 5 partitions
>            Reporter: Sharmadha Sainath
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>         Attachments: ATLAS-690-3.patch
>
>
> When 1000 tables are imported into Atlas using Hive hook,Read time out exceptions occur. This happened with the latest Atlas build with commit id : 922a83c9a10e857d54855463225e9a5c375bc2b9. 
>    • Hive ingestion was completed in 1 minute 50 secs. 
>    • Atlas ingestion took more than an hour .
> With Last 1000 tables run that was done in Atlas with commit id :
> b9575f29df3cc014f1b076abf52d88249bf4d0ef,
>  • Hive ingestion was completed in 3 minutes
>  • Atlas ingestion by 5 minutes.
> The Exception stack trace :
> Error handling message org.apache.atlas.notification.hook.HookNotification$EntityUpdateRequest@7474dd2d (NotificationHookConsumer:224)
> com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
> at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
> at com.sun.jersey.api.client.Client.handle(Client.java:652)
> at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
> at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
> at com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634)
> at org.apache.atlas.AtlasClient.callAPIWithResource(AtlasClient.java:911)
> at org.apache.atlas.AtlasClient.callAPIWithRetries(AtlasClient.java:565)
> at org.apache.atlas.AtlasClient.callAPI(AtlasClient.java:935)
> at org.apache.atlas.AtlasClient.updateEntities(AtlasClient.java:530)
> at org.apache.atlas.notification.NotificationHookConsumer$HookConsumer.run(NotificationHookConsumer.java:216)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:689)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
> at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
> at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)