deutrino's picture

I build and throw away a lot of Turnkey VMs just doing what I do, and the last several I've done have had the same problem with Webmin choking on the first one to three (?) connection attempts every so often ... so like, hit https://box:12321/ and get connection error, reload once or twice, get the login page. Log in, maybe stuff works. Then maybe log into adminer, and go thru the same song & dance after which you're logged into adminer and it works fine for a while longer.

All these VMs are running on one of two Proxmox hosts, actual VM not LXC, running on old but unremarkable Intel based PCs.

This is happening quite reliably for me... anyone else out there seeing it?

 

Forum: 
Jeremy Davis's picture

Thanks for reporting. It certainly doesn't sound ideal.

Thanks too for the explicit note that it's in a VM (not LXC).

When it fails in your browser if you refresh a few times it eventually works right? Is this only immediately after boot? If so does firstboot, reboot or clean boot make any difference? Or does it happen after it's been idle a while? Or can it occur any time?

Also, I'm not 100% clear whether you're saying that Adminer login is initially problematic too or not? If Adminer is displaying similar behavior my guess is something lower level, maybe network related? FYI Webmin run's on it's own mini-server and is reverse proxied by stunnel; Adminer is hosted by Apache - so system network seems the only common factor.

If it's just Webmin, then I'd assume that it's probably stunnel, or maybe webmin itself. If it's just after boot, then it'd be interesting to see the boot logs (i.e. from the journal). If it's not just at boot but is after it's been sitting idle for a while. Perhaps check resource usage/availability?

TBH, very occasionally I've noticed similar behavior when I'm doing my final manual pre-release testing (on LXC as well). I rarely use Webmin so the only time I've experienced it has been immediately after firstboot or a reboot. As it has been rare and there seemed to be no rhyme or reason to it and it always resolved itself with no apparent longer term issue, I have always just assumed that on those occasions I was too quick (hit it before it had finished starting).

Intermittent issues are the worst. Usually if you can work out the exact steps to reproduce an issue, you can have a red hot go at fixing it. By their nature, the requirements of intermittent issues are obscure, or at least obscured - making them really hard to diagnose. You can really only diagnose it while the problem is occurring and even then it can be hard.

Nicholas Barnes's picture

For what it's worth, I've been seeing this too - exactly as the OP describes.

It happens on both containers and VMs definitely straight after installation, but m not sure about other times. I'll have a play and see what it needs to reproduce. 

Nicholas. 

Jeremy Davis's picture

Thanks Nicholas for confirming your similar experience. Happy to hear any further info you can add.

FWIW, I am considering removing Webshell (and Stunnel) for our next major release; v18.x. Newer versions of Webmin (than what's currently available in TurnKey v17.x) now include a proper interactive shell (so I've read, I haven't tested). So that makes Webshell somewhat redundant. Webmin can run directly self hosted, so we might as well get rid of Stunnel too. I suspect that might make it start quicker at boot.

deutrino's picture

Hey, just wanted to update and note that I'm seeing this with LXC hosts as well.

It's not only after the first boot for me, I think it might only be hitting on the start of an SSL session? Not sure, just guessing. I suspected maybe some weird problem with stunnel but haven't had any spare energy to try to diagnose anything yet. Will just keep an eye out and see if I spot more patterns.

 

Jeremy Davis's picture

Thanks for the update.

I suspect that you are right and it's something funky going on with stunnel.

FWIW, Webmin can terminate it's own SSL/TLS connections fine. The only reason it's behind stunnel is historic. I've already been considering removing stunnel in v18.x.

So if you felt like testing that out, I'd be super interested in any feedback on whether that seems to resolve the issue (or not). I'd be surprised if it didn't.

As an aside, in v18.x I'm also looking at removing Webshell (aka shellinabox) as well. In newer Webmin (that we don't yet package - but it's fairly high on my todo list) it actually includes a proper interactive shell (the version we current;y provide and earlier only has a limited shell that does not support interaction). So it seems like duplication to also provide Webshell. Happy to hear your thoughts on that (if you have any).

Add new comment