In the latest turn of events I ran into a problem with my <a href="http://www.ejabberd.im/">ejabberd</a> server (running for this very domain) somewhere around Friday. All of a sudden, without any change at all, I found myself unable to see the status of all external contacts in my jabber client, <a href="http://gajim.org/">Gajim</a>. This happened somewhere around Friday, but I didn't notice it before Saturday, when a friend of mine noticed I'm not showing-up on his roster list. After some initial investigation (involving checking the firewall, which hasn't really changed in a while, as well as making sure the server itself is running ok), I gave up for that day since it was my nephew's birthday.
Of course, I couldn't let this issue keep going, since I do depend on my Jabber server for some contacts. I've spent about 2-3 hours searching on the Internet, trying to find the solution. All DNS requests seem to work ok using the dig command, making me rule out that as a possible problem (as it turned out later on, it was a wrong assumption). What got me going in the right direction was a Debian bug report <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=539409">#539409</a>. After trying out the suggested commands provided in that bug report, I found out that DNS queries issued in such a way seem to wait indefinitely. At first I thought the /etc/resolv.conf wasn't parsed correctly, so I made a copy and had it have only a single nameserver directive. The issue remained, but then I decided to try querying some other DNS server for the _same_ SRV record. And... It returned a proper response. After two more dig commands, I finally found out the DNS server which was first in the list of nameservers in /etc/resolv.conf did not respond to any query. I rolled-back the old resolv.conf, commented-out the offending nameserver line, restarted ejabberd, and voilà - it all started to work.
And the lesson of the day? The next time when you have a problem with some network server, <i>make sure</i> that the issue is really not a network problem. All kinds of funny things can happen with the network and network-related services, and very often it's the main culprit when something stops working.
Comments are closed.