InformB

New Member
Joined
Mar 27, 2023
Messages
11
I've been using Windows server 2012R2 Standard for almost a decade now. I have apps running on the server which have created about 50,000 local users. These are local users created using WMIC scripts and eq. API's. The purpose is to allow for easy integration with IIS authentication. The app creates a local user account and IIS uses the built in Windows authentication to authenticate the user.

It's been running great on 2021R2 but I recently decided to upgrade to a 2019 Datacenter edition so I created a new 2019 server and used the same scripts to recreate the 50,000 users on the 2019 server.

However when I reboot the 2019 server it just hangs for hours at the spinning circle and takes up 100% CPU. It never get the login screen. I've never had this issues with 2012R2, to test it I spun up a new 2012R2 server and recreated the same 50,000 local users with the scripts and it took a few minutes to reboot but that was it.

Is there something different about 2019 server datacenter vs 2012r2 standard which causes the 2019 server to hang on boot when there are 50,000 local users? Is it trying to process all the users before presenting the login (I've set it to login automatically to an admin account but it never seems to get there). I'm at my wits end now and I'm open to suggestions on how to debug this further or what settings should I look at? I've checked out the group security and logon policies and they're both the same for the 2012 and 2019 servers. What am I overlooking?

They're both running on AWS with 1GB RAM. Thanks in advance and please don't hesitate to throw out whatever ideas you may have. I cannot change the architecture at this time and the easiest way to integrate IIS with authentication is to create a local user account. They don't need to have local login privileges only network login (for IIS), so these accounts are added to a custom Group on the windows server and removed from the standard Users group.
 
Solution
It's possible that the issue is related to the number of local users on the 2019 server. While there is no hard limit on the number of local users that can be created on a Windows server, excessive numbers of users can cause performance issues.
You may want to try some troubleshooting steps to determine the cause of the issue:
1. Check the event logs on the 2019 server to see if there are any errors or warnings that might indicate what is causing the hang.
2. Try booting the server in safe mode to see if it can successfully boot without all of the installed services and drivers. If it does boot successfully, you can begin to narrow down the cause by selectively enabling services and drivers until the issue returns.
3. Check the system...
Looking at the procmon logs for the 2019 server more carefully I noticed that it's not that LSASS is taking a long time to parse the 50K users from SAM, it appears to be stuck in a loop!! LSASS keep reading all the SAM entries over and over again in a loop for 12 hours! Here's an excerpt from the logs showing when it ends one loop and then starts another loop. It's the exact same loop everytime. I can't see the end of the loop because procmon timed out after a few hours of logging but for those few hours it ends the last user, then reads the services NTDS and then restart the SAM user loop again. Anyone may any sense of this?