So, recently I was asked to help out with an IT problem a friend had. The issue was as follows: Windows Server 2016 ended up booting into safe mode after some botched update … or so …
The details weren’t really clear, because one of the remedies that were tried before I got called in was to restore to the state approximately three months prior (system restore, as far as I understand). The system was configured as DC in a small Active Directory domain. The emergency was due to the fact that the unavailable DC made it impossible to use all the services. This being a doctor’s practice made it “inconvenient” for the staff to work that day to put it euphemistically. The hope was that this would remain the only day during which they’d have to fall back to pen and paper.
Anyway, the point was that something seemed amiss and so the mentioned friend popped over before noon, a Mac in hand, asking my help via a TeamViewer session he established with the server.
Of course I first made sure to run sfc /scannow
, which — as usual — yielded some spurious errors, but most importantly triggered certain self-repair mechanisms. After that the usual dism /Online /Cleanup-Image /RestoreHealth
was used. In parallel I looked at the event log and tried to ascertain if there were any disk errors1. Worryingly it took probably half an hour for dism
to even show the progress bar. Once it did, everything seemed okay, though. The run took ages and the friend left to hit the bed2. A few hours later I was called again and decided to pop over to have a look at the server using screen and keyboard.
I looked once again into the event log, but unsurprisingly the log had huge gaps. It was evident that NTDS
3 wasn’t getting started due to safe mode being active. Peeking into the registry and seeing the control sets as well as the selection of the current control set and their parameters suggested that something was happening at boot time already. So time to ask bcdedit
. And sure enough it showed safeboot=DsRepair
on the {current}
boot selection.
Hmm, so what to do? My initial hunch was to copy the {current}
boot configuration to one were the following were also set: quietboot=off
, sos=on
, lastknowngood=on
, nocrashautoreboot=on
. The idea was to get a more verbose boot output. Alas, it meant waiting more than an hour thanks to the fact that evidently Windows Update or some opaque process was trying to do something during shutdown. We’re still not sure and the event log didn’t exactly help either to shed any light on it.
Either way, so we booted into the copied configuration and thereby got confirmation that there wasn’t any BSOD at boot time which we had missed due to auto-reboot. But we ended up in safe mode once again. But why?
At this time I already had a hunch based on this blog article that I had found earlier. Now, the server was also running Veeam, but no VMs were involved. Was the local Veeam agent to blame having placed the machine into safeboot=DsRepair
but not undoing the change, e.g. because Veeam got interrupted? We may never know.
However, the remedy suggested in the aforementioned blog post did work: bcdedit /deletevalue {default} safeboot
. In our case I opted to do the change on the {default}
entry, since we had booted into the cloned boot configuration. And since we had to reboot either way to get out of safe mode, we did so. This time the reboot was fairly quick and after reboot it was immediately clear that the server was back in service.
Problem solved.
// Oliver