New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't cache resolved hostnames forever #16412
Comments
The DNS cache of a Java process is handled by the property
|
As per the Oracle documentation on policy files the above-mentioned property (and maybe also the related property It is possible to provide an application-specific I wonder whether we should just point users to the Oracle documentation on how to change the DNS cache lifetime system-wide because I am not sure whether they want to configure different (Java) DNS cache lifetimes for different applications on the same machine anyway. Wdyt @clintongormley? |
@danielmitterdorfer I'd be happy with just adding this documentation. Is there anything we need to change code-wise as well? Or perhaps just adding tests to ensure that the documented solution works? |
Also related to #10337 and logstash-plugins/logstash-output-elasticsearch#131 and #11256 |
We run all our jvms with this configuration. We also only pass hostnames to Would appreciate if somebody else could verify that it works as expected
|
@clintongormley I would just point users to the Oracle documentation and not add any tests for two reasons:
|
@miah I have created a small demo program to verify that everything works as expected (source code as gist). In addition, I've set these values in my
When you look at the demo source code, you'll see that we query for one existing host ("www.google.com") and for one non-existing one ("www.this-does-not-exist.io"). My expectation is that we see DNS requests every 10 seconds for the existing host and only one for the non-existing one when we periodically clear the OS DNS cache. When you invoke the demo program as described in the gist and open tcpdump (I did Output from the application:
Output from tcpdump:
You see that we query only once for "www.this-does-not-exist.io" but every 10 seconds for "www.google.com". Similarly, you should also see only one DNS request for "google.com" when you set |
@miah One more thing: Note that the program runs with the security manager enabled. The JDK implementation behaves differently whether or not a security manager is enabled (see also my comment above with the link to the Oracle docs). As Elasticsearch 3.0 will make security mandatory (#16176) it is a sensible assumption that we assume here too that a security manager is enabled. |
It would also be nice if the transport client could do the lookups when it connects, instead of just when the client object is created, as IPs can change e.g. when going through a load balancer. |
Despite keeping the hostname an InetAdress will never attempt to resolve the ip after it has been created. I think this is a source of confusion for many using the transport client. It is particularly important when connecting to the cloud service. Due to the load balancers even a single node cluster has multiple ip adresses and they may change at any point in time. @clintongormley Is this issue also relevant for the transport client or should we make a separate issue? |
@alexbrasetvik, @beiske: I don't know what @clintongormley thinks about your idea but I think it would be better to create a new ticket for the transport client topic. |
What's the status on this? This causes problems with AWS ElasticSearch service. |
@lifeofguenter The topic discussed in this ticket has nothing to do with Elasticsearch per se. It is a pure JVM level setting so (in the scope of this ticket) we will not change any code but maybe just add documentation on how to change this setting in Elasticsearch. Just to be sure: by "AWS ElasticSearch service" you refer to Amazon's service and not our Elasticsearch cloud offering (called Elastic Cloud), right? We have no additional insight on Amazon's offering and I am afraid you will also not be able to change a JVM level setting there. I fear this has to be addressed by Amazon (as it is a JVM level setting that we cannot change from within the application). |
I did the following changes (ubuntu 14.04) in
But that somehow did not do the trick? Yes, I am referring to: https://aws.amazon.com/elasticsearch-service/ - however we are running logstash as per: http://www.lifeofguenter.de/2016/01/elk-aws-elasticbeanstalk-laravel.html - which is hosted on our "own" EC2 instance, thus we are able to do changes, and thats also the link that currently complaints if the dns record to ElasticSearch changes. UPDATE: sorry my problems are most probably unrelated! |
Still have this issue... If I replace masters, I need to reboot every node in the cluster otherwise it never detects the IP changes. I have a DNS TTL of 6 minutes. I replaced my master servers, and 20 minutes later elasticsearch is still trying to connect to the old IP's. I have the java.security changes in place. Elasticsearch is configured to connect to a round-robin dns entry for the master nodes.
|
@clintongormley @danielmitterdorfer I don't think it's correct that today setting the DNS cache properties at the JVM level is going to resolve the problems being reported here. The underlying reason is that we do hostname lookup during initialization of unicast zen ping and never do lookups again. This is currently a deliberate choice. |
@jasontedor That definitely seems to be the case. |
@jasontedor Agreed. In that case the DNS cache settings will not help. As you explicitly mention that this is a deliberate choice, does it make sense to close this ticket then (and maybe document the decision or at least its consequences)? |
I think it's OK to re-resolve the configured unicast host list when pinging. We don't do it often (only on master loss/initialization ) and we also ping all ips of last known nodes on top of it. My only concern is that DNS resolution timeout/failure should not block the pinging or delay it (remember we do it on master loss and we block writes until pinging is done). This means implementing this can be tricky code wise (that code is already hairy) |
I do too, I'm only explaining why the DNS cache settings here did and do nothing. |
So this sounds to me we should remove the "Discuss" label and add "AdoptMe" instead. |
Any good workarounds for this? I have a similar problem running on Docker with swarm mode, where the master/gossip nodes are runnig as a service and the data nodes point to the service name. As Docker uses DNS for the discovery this is a problem there as well. |
@thxmasj I provide the full list (as reported by Docker) explicitly. That mitigates, but does not resolve the problem. |
This is now addressed in the forthcoming 5.1.0 (no date, sorry). If you are in an environment where DNS changes are a thing, you will have to adjust the JVM DNS cache settings in your system security policy. Please consult the Zen discovery docs for details. |
Has anyone tried enabling |
is there any update on this issue ? whats the fix |
Yes, it's addressed starting in Elasticsearch 5.1.1. You can read about this in the zen discovery docs. |
Thanks for quick reply. I'm using ES 2.1 version , when one of ES instances is rebuilt then its throwing NoRouteToHostException. to resolve the issue its forcing me rebuild dependent applications which connects to ES as client. client.transport.sniff=true will enable to discover the new nodes added to the ES cluster. Is there workaround for ES2.1 version |
As mentioned in the document , I still set networkaddress.cache.ttl=0 in java.security but it didnt resolve the issue since when i ping the new ES instance its resolved with new IP address as expected so I dont think its DNS caching issue. |
It's not addressed in the 2.x series, there is nothing you can do there. It's only resolved since 5.1.1. |
Thank you for the clarification. Could you please share your inputs on this. Below are my observations with ES cluster 3 nodes/instances when I rebuild ES node one and waited until new ES node is built with new IP and is added back to cluster with green health then followed to rebuild other 2 nodes similarly . Rebuilding ES nodes one after one - now 3 nodes having 3 new IP addresses client.transport.sniff = false To fix this I need to restart my application to get new connections with new ES nodes. client.transport.sniff=true NoRouteToHostException is being logged continuously. But I did not see org.elasticsearch.client.transport.NoNodeAvailableException. I am able to perform CRUD operations from my client application and also netstat -an | grep 9300 command shows connections with new ES IP addresses NoRouteToHostException is in this case (client.transport.sniff=true) is just WARN and not causing any impact on the CRUD operations on ES. But logs getting grow due to that exception. Please share your thoughts on this. |
Please open a topic on the forum. We prefer to use the forums for general discussions, and reserve GitHub for verified bug reports and feature requests. |
thanks , its done. |
Seeing the same issue with spring boot 2:
|
Looks like a spring bug, not yours sorry. |
Today we use InetAddress to represent IP addresses. InetAddress handles the work of resolving hostnames from DNS and from the local
hosts
file.With the security manager enabled, successful hostname lookups are cached forever to prevent spoofing attacks. I don't know if this behaviour was different before the security manager was enabled, but it seems unlikely given issues such as #10337 and #14441.
It would be a useful improvement to be able to specify unicast hosts as hostnames which are looked up from DNS or hosts, then if the IP addresses change and the node need to reconnect to the cluster, it can just do a fresh lookup to gather the current IPs. Similar logic would help the clients.
If we make this change, it should be configurable (otherwise we're introducing the chance for spoofing) and we should consider the impact on hostname verification of ssl certs.
Testing this change would be hard...
The text was updated successfully, but these errors were encountered: