IPA Survival Skills - Part 1

Oh $#*^

That's the words you never want to hear your Junior IdM Administrator say at 5:00 PM. Those words mean you're going to be in for a bit of a headache, but no worries, you setup multiple replicas all in geographically diverse locations so if one IPA server goes down it doesn't take the rest of your network, right?

Right?

Oh.

IPA Survival Skills

I'm going to walk you through the basic troubleshooting steps I use at my company to figure out what exactly is making an IPA client or server start taking on water. I write this in the hope that you can save it before it keels over, or failing that, at the very least have some good information for your support team so they can hit the ground running.

Document Everything(TM)

The moment you notice IPA has gone haywire, start gathering debugging information. In your /etc/sssd/sssd.conf, under all sections set debug_level = 9. Once you have done that, run this following string of commands:

# systemctl stop sssd; rm -rf /var/log/sssd/* /var/lib/sss/{db,mc}/*; systemctl start sssd; sleep 120; sosreport -a

Grab that sosreport and keep it safe, that is going to be your initial vault of important information for support to use.

Also, start thinking about what has changed. Have you updated recently? Did your Windows team decommssion an AD server and not send out a change notice? Has your networking team gone mad with power and started fiddling with DNS? These are all seemingly little things that have heard of knocking massive deployments offline.

Do Some Basic Sleuthing

Have you checked the system journal? What about /var/log/secure or /var/log/audit/audit.log? Did your security team make any sneaky patches (/var/log/{dnf,yum}.log) and not tell you?

Try Googling some of the errors you're seeing. You would be shocked how often you're able to simply resolve a problem just by doing 10-15 minutes of research. Also, even though this may be cursed, try using Bing as well. Sometimes one search engine can pick up something the other doesn't. Plus Bing gives you free Taco Bell, so why not?

Can't Login? Do A Kerberos Trace

This is one of those things your support team may end up asking for, so feel free to gather it along with the sosreport. Run KRB5_TRACE=/tmp/krb5_trace_$(hostname)_$(date +"%m-%d-%Y_%H.%M.%S").log kinit <user>@<DOMAIN IN ALL CAPS>. Take a look at that log file, also try Googling some of the errors, and then save it somewhere safe.

Reboot

This really doesn't apply to IPA servers, but if you have a IPA client that has gone haywire, just try rebooting. I've gotten so many weird looks from behind the screen when I tell fellow admins that, and it doesn't always work, but more often than not it does. Blame the Gremlins.

links

social