Richard's picture

Please can you make the following change for the next release please?

 

Add KillMode=process

to [Service] section of

/lib/systemd/system/ssh@.service

 

Taken from here

Forum: 
Tags: 
Jeremy Davis's picture

I just tested a number of TurnKey servers (both as LXC CT and KVM VM) and I can not recreate the issue you are noting?!

Here is the steps I took (from a Debian Bullseye desktop - to a fairly fresh v16.1 WordPress LXC CT running on PVE):

  • Open terminal application (on Desktop).
  • Log into (v16.1 CT) server via SSH.
  • Start screen session on server.
  • Close terminal application on desktop (thus severing the SSH connection, without closing the screen session).
  • Then opening a new terminal window on desktop, open a new SSH session into the (v16.1 CT) server.
  • Confirm existing screen session still exists. It does!?

Digging a little deeper, whilst I have the ssh@.service file as noted by you, but it's not currently in use for me?! So I'm at a bit of a loss why your system seems so different to the ones I've looked at? You mentioned in your other recent post that you are running as an LXC container on Proxmox. OTTOMH, the only "simple" difference that might make sense of such significant differences between our servers (beyond you doing lots of customisation) is perhaps if you are running it as a privileged container?

If you can recreate these issues from a fresh TurnKey instance, please share how you do that. If I can recreate the issue, I can almost certainly fix it (or at least provide a solid workaround).

Richard's picture

Hi Jeremy,

Here are my steps on TKL Core 16.1 container deployed from scratch on proxmox 7.1-4:

  • Start top in local console to container in proxmox
  • Open terminal application (on Desktop)
  • Log into (v16.1 CT) server via SSH from pve as root using publickey
  • Start screen session on server
  • Ctrl-A, d    - to detach
  • exit
  • Watch sshd & screen both killed in top

The above change fixed the behaviour. How can you tell if your system is using the systemd .socket or .service? The article talked about swapping from socket to service but didn't say how, so I went with the easy option and it worked.

Unprivileged container = Yes

Just tried it on a freshly deployed turnkey-core 16.1-1 container and got the same result.

Called testTKL, oh how I smiled :-)

Richard's picture

Ctrl-a, d (not uppercase)

Jeremy Davis's picture

This is a super weird one!

I definitely cannot reproduce this at all!

I had left the container running the other day when I first responded to you. I logged in via PCT (from the Promxox host) and ssh and my 2 screen sessions are still running from when I was testing the other day. My laptop has gone to sleep (at least once every 24 hours) within that time frame, so even if I hadn't exited out of any remaining SSH sessions, all connections would have been closed. Regardless, I double checked for any open SSH sessions (within the pct terminal) to be sure. There weren't any.

I logged in via SSH (in another terminal) and again confirmed both of my screen sessions. It was interesting to note that a new 'sshd' process did start when I logged in (for a total of 2). That new 'sshd' process exited once I exited the SSH session. I tried opening 2 separate SSH sessions and there were 3 'sshd' processes running. So it seems additional processes are still used by SSH, even if not using the socket/one instance per connection config. I double checked and my system is definitely not using the socket.

Bottom line, the change you are suggesting will only make difference if other config is changed. Unfortunately, I have no idea how to move to the socket config and I'm not clear if there is any real advantage.

Also, unless you understand the consequences of what you are doing, editing files in /usr/lib (or /usr in general, with the exception of /usr/local) is generally frowned on. That is because those file trees are generally managed by package management. Creating new files that exist already is rarely going to be an issue (although most apps that accept modifications to data in /usr usually have "safe" places for changes in /etc). That main issue is that future apt updates will nuke your changes.

For the specific case of systemd service files, the right way is to either; create a new complete service file, or just an "override" snippet with the additions/changes.

To create a full replacement service file, put it in the same subpath as the file you're overwriting, but in /etc. E.g. To overwrite /lib/systemd/system/ssh@.service, put your new service file: /etc/systemd/system/ssh@.service

To just add a setting as you have, you can just use an override snippet. Do that by creating a directory in /etc wth the same name as the service, but with '.d' on the end. E.g: /etc/systemd/system/ssh@.service.d/. Then create a file 'override.conf' inside that. Then just add the section heading and the setting you are adding. E.g. for your case, you would have a file "/etc/systemd/system/ssh@.service.d/override.conf" and it's contents would be:

[Service]
KillMode=process

Both options have pros and cons, although for a single addition such as you are noting here, I'd suggest that this latter method is cleaner.

Richard's picture

I think the thing we need to establish is why my containers are using the socket and yours appear to be using the service. Is there some sort of decision or set of conditions? I honestly haven't messed with anything and don't have time to figure it out right now, sorry.

Jeremy Davis's picture

Yes, I'd be very interested to understand how that happened too!

It may not explain it, but could you please share the output of the following:

systemctl status ssh.service
systemctl status ssh.socket

Thinking some more, it's not default Debian behaviour, nor how we configure it. It's definitely not like that on any of our other builds (I've double checked & and it's not like that on a full clean install from ISO, not even in the extracted ISO squashfs filesystem). I've also had a quick grep through our ISO to LXC conversion tool (all our builds start as ISO, then we use that base for all our other builds). As you can see, that doesn't touch anything to do with ssh service config.

Perhaps there's something that I'm missing, but I feel fairly confident that it's something that's happening after you've downloaded the template. As you are 100% sure that it's nothing you've done AND you've recreated the issue locally, the only thing that makes any sense to me is that it's Proxmox doing it (at launch time I assume?). Or perhaps there is some specific config and/or issue on your server that forces SSH service to fail and it tries to fall back on the socket? As you are likely aware, LXC leverages the host system, so unlike most other virtual platforms the host config can impact the container. (Although this specific scenario makes no sense to me).

Actually, another possibility that does occur to me is that my Proxmox was originally installed when it was v1.x. (about 10 years ago). Since then I've done Debian style "in place" upgrades, plus I'm still only running v6.x (we've just started our v17.0 release and I rely on my PVE server during development). I will update it, but only after I've done the initial v17.0 LXC builds (to confirm they work ok on v6.x, as well as v7.x). So there is a possibility that my install has some old cruft that affects the behaviour and/or PVE v7.x does some additional container pre/post launch configuration that changes the behaviour. I don't really understand why they might do that and after looking through the release notes for v7.0beta1, v7.0 & v7.1 there doesn't seem to be anything explicit that might be a cause...

Maybe it's worth asking about it on the Proxmox forums? Perhaps the installation and configuration subforum? (the other PVE subforum is networking and firewall which doesn't seem appropriate).

FWIW we have lots of Promxox users, and this has never been reported elsewhere. Perhaps very few of them use screen, tmux or similar? Or perhaps there is something specific to your setup? I've also spent a fair bit of time searching to see if I could find anyone else reporting this, unfortunately I couldn't. Obviously that doesn't mean that it's just you experiencing this, but it does suggest that it's not that common. OTOH it is worth noting that there is precedent for people experiencing TurnKey PVE LXC guest issues that I can't reproduce. So perhaps this is another of those...?

Ultimately though, the (spirit of the) change you suggest doesn't make any significant impact. We try to avoid changing Debian defaults as much as possible, beyond explicit security hardening and/or bugfixes. But even though I can't recreate it, making this change isn't really that big a deal. So perhaps I might consider that. I've opened a new issue on our tracker so it doesn't get forgotten.

Final word though, if I haven't already mentioned it (and apologies if I have) you shouldn't edit the files in /lib/systemd. They will be overwritten on package update. Instead use 'systemctl edit' (or manually create the override file structure in /etc/systemd). E.g.:

systemctl edit ssh@socket
Richard's picture

The plot thickens!

Here's the output you asked for:

root@host /home/erp# systemctl status ssh.service
* ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:sshd(8)
           man:sshd_config(5)
root@host /home/erp# systemctl status ssh.socket 
* ssh.socket - OpenBSD Secure Shell server socket
   Loaded: loaded (/lib/systemd/system/ssh.socket; enabled; vendor preset: enabled)
   Active: active (listening) since Tue 2022-04-12 01:17:48 BST; 1 weeks 3 days ago
   Listen: [::]:22 (Stream)
 Accepted: 47; Connected: 1; Refused: 1
    Tasks: 0 (limit: 9426)
   Memory: 0B
      CPU: 0
   CGroup: /system.slice/ssh.socket
Apr 12 01:17:48 host systemd[1]: Listening on OpenBSD Secure Shell server socket.

The odd thing is I can't reproduce it on my other containers and the same commands yield very different results! The only thing unusual about this container is it was created by restoring a backup of one of my other containers and modifying the ip, hostname, etc in proxmox. Here's the output of the one that was cloned:

root@shared1 /home/erp# systemctl status ssh.service
* ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2022-04-12 06:18:24 BST; 1 weeks 3 days ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 442028 (sshd)
    Tasks: 9 (limit: 9426)
   Memory: 25.0M
      CPU: 2.548s
   CGroup: /system.slice/ssh.service
           |-442028 /usr/sbin/sshd -D
           |-498772 sshd: erp [priv]
           |-498792 sshd: erp@pts/2
           |-498793 -bash
           |-498804 sudo su
           |-498805 su
           |-498806 bash
           |-498815 systemctl status ssh.service
           `-498816 less -X -R -F
Apr 22 20:44:17 shared1 sshd[498717]: Received disconnect from 10.0.0.1 port 44954:11: disconnected by user
Apr 22 20:44:17 shared1 sshd[498717]: Disconnected from user erp 10.0.0.1 port 44954
Apr 22 20:44:17 shared1 sshd[498698]: pam_unix(sshd:session): session closed for user erp
Apr 22 20:53:32 shared1 sshd[498772]: Accepted publickey for erp from 10.0.0.1 port 44956 ssh2: ED25519 SHA256:O6EnwdTuWtvFY
Apr 22 20:53:32 shared1 sshd[498772]: pam_unix(sshd:session): session opened for user erp by (uid=0)
Apr 22 20:54:05 shared1 sudo[498804]:      erp : TTY=console ; PWD=/home/erp ; USER=root ; COMMAND=/usr/bin/su
Apr 22 20:54:05 shared1 sudo[498804]: pam_unix(sudo:session): session opened for user root by LOGIN(uid=0)
Apr 22 20:54:05 shared1 su[498805]: (to root) erp on pts/2
Apr 22 20:54:05 shared1 su[498805]: pam_limits(su:session): Could not set limit for 'core' to soft=0, hard=-1: Operation not permitted; uid=0,euid=0
Apr 22 20:54:05 shared1 su[498805]: pam_unix(su:session): session opened for user root by erp(uid=0)
root@shared1 /home/erp# systemctl status ssh.socket
* ssh.socket - OpenBSD Secure Shell server socket
   Loaded: loaded (/lib/systemd/system/ssh.socket; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2022-04-11 04:58:57 BST; 1 weeks 4 days ago
   Listen: [::]:22 (Stream)
 Accepted: 15; Connected: 0;
Feb 28 22:48:46 shared1 systemd[1]: Listening on OpenBSD Secure Shell server socket.
Apr 11 04:58:57 shared1 systemd[1]: ssh.socket: Succeeded.
Apr 11 04:58:57 shared1 systemd[1]: Closed OpenBSD Secure Shell server socket.

Maybe I should try a restart?

I have only just started using screen so I just assumed all my containers would be the same. I guess to assume really does make an ass out of u and me!

Jeremy Davis's picture

Even the CT you have that is using the ssh.service (not ssh.socket) has different ss.socket config to mine:

Here's yours:

* ssh.socket - OpenBSD Secure Shell server socket
   Loaded: loaded (/lib/systemd/system/ssh.socket; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2022-04-11 04:58:57 BST; 1 weeks 4 days ago

And here's mine:

* ssh.socket - OpenBSD Secure Shell server socket
   Loaded: loaded (/lib/systemd/system/ssh.socket; disabled; vendor preset: enabled)
   Active: inactive (dead)

So yours is enabled and has been running (but has/was stopped or killed). Mine is disabled and hasn't ever run.

Richard's picture

Like this issue, I think this is caused by the sshd service crashing/stopping.

I think it's a fair assumption that if the service is stopped, the socket-based ssh session will die on logout and take screen with it.

Whereas if the service is running, processes are persistent.

Now I just need to figure out why my container sshd has the tendency to crash.

Jeremy Davis's picture

I've never struck that scenario, but I see the value of it falling back to socket if the service fails. Although it shouldn't fail in the first place really!

So long as you can get access via an alternate method (to restart crashed ssh daemon - e.g. 'pct enter VMID' - where VMID is your VM's ID number), you could disable the ssh.socket. You'll then know straight away and be able to check what happened to the ssh service more easily.

systemctl disable --now ssh.socket

I suggest first checking the ssh.service journal:

journalctl -u ssh.service

It might also be useful to view other journal messages from around the same time. You can view the full journal for the last hour like this:

journalctl --since "1 hour ago"

You can also use '--since' and/or '--until' to limit the results. Format is "YYYY-MM-DD HH:MM:SS" e.g. "2022-07-12 23:15:00".

Feel free to share any logs, etc if you want a hand trying to work it out.

Chris's picture

I've been struggling with what seems to be the same problem for a couple of days before happening across this thread. If I log into a TKL container and start tmux/screen, either directly or via byobu, then they work correctly whilst I stay logged in. However, as soon as I log out, either manually or by killing the SSH connection, then tmux/screen is terminated. 

I've tried lots of things to try to fix this, but it's seemed odd that the basic functionality of tmux/screen just isn't working. I've tried fixes on three different TKL containers and none worked on any of the containers. I think two containers were Core and one MySQL, but I guess MySQL is based on Core anyway. 

After reading this thread I began to wonder if it's possibly a TKL issue. So, I created a standard Debian 10 container (not TKL based) on the same Proxmox 7.2 instance as my TKL containers. This container works correctly. I can login via SSH, issue a byobu-tmux command, use byobu and then when I log off/disconnect the underlying tmux process(es) stays running so that when I connect next time the sessions are ready to use again.

I know this doesn't get any closer to why it's happening, but it does seem to suggest that it's something in the TKL config that's causing it rather than it being related to Proxmox.

It's getting late here now but I'll see if I can find any obvious differences between configs on the standard Debian container and the TKL containers.

Jeremy Davis's picture

Is the ssh.socket runnign on your server too (as noted by Richard above)? I still don't understand how or why that would or even could happen?! That's not the default config which we ship with!?

I also have no idea why I can't reproduce it?

If I can reproduce the issue, then I'm sure that I can fix it. But I can't reproduce it?! I launch a container, SSH in and start a screen session, exit, wait a few hours, SSH back in and the session is still running?!

I wonder what might be different between your (and Richard's) Proxmox install and/or config and mine?

Richard's picture

As I said in my last message in this thread, I'm pretty sure this is just because sshd has crashed on your container. Do a service sshd status to check.

To resolve you'll have to log out completely and restart the service through a local terminal/console connection. If you're logged in via a socket connection sshd appears to fail to start.

service sshd restart

Or I suppose you could just restart the container too

Chris's picture

ssh.socket is active rather than ssh.service. That's not something I've set and I don't know what's caused the change from ssh.service. I have 10 TKL containers and one 'raw' Debian container running on the Proxmox instance and they're all set to use ssh.socket.

One thing I'd never noticed before is that there's no sshd process by default (when viewed from the local Proxmox console for a server). ssh.socket is listening on port 22 and starts an sshd process when I initiate an SSH connection into the server. I assume this is normal. If I use 'systemctl start sshd' then that creates and sshd process (based on ssh.service). After doing that tmux/byobu works.

I might just be describing the difference between ssh.service and ssh.socket though...I've never studied ssh in enough detail to have known that the two different options existed! 

 

Chris's picture

I've done some digging around and not found the source yet, but I've seen reference to an update (I assume Proxmox) that enabled ssh.socket in preference to ssh.service. It may be because Proxmox 7 is built on Debian 11.5 and ssh.socket is the default in Debian 11 but I can't confirm this.

A 'fix' is to mask ssh.socket and enable ssh.service, i.e.

systemctl mask ssh.socket

systemctl disable sshd

systemctl enable ssh

I'd do that from the console rather than being logged in via SSH. I suspect you might get logged out if you did it via SSH!

tmux and byobu (and presumably screen) work well after doing this. I don't know what this means for the future though. If ssh.socket is becoming the default then I guess ssh.service will become deprecated at some point.

Chris's picture

After enabling ssh you'll also need to:

systemctl start ssh

 

Jeremy Davis's picture

I really don't understand this issue. Out of interest, I just tried launching a clean TurnKey v16.1 Core LXC container on Proxmox (it is still v6.x, so perhaps that's a factor?).

SSH is set up exactly as I would expect it:

root@JED-TEST-ssh ~# systemctl status ssh.service
* ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2022-09-20 03:43:14 UTC; 1min 58s ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 7238 (sshd)
    Tasks: 5 (limit: 4915)
   Memory: 6.0M
   CGroup: /system.slice/ssh.service
           |-6252 sshd: root@pts/2
           |-6272 -bash
           |-7238 /usr/sbin/sshd -D
           |-7744 systemctl status ssh.service
           `-7745 less -X -R -F

Sep 20 03:43:14 JED-TEST-ssh systemd[1]: ssh.service: Found left-over process 6272 (bash) in control group while starting unit. Ignoring.
Sep 20 03:43:14 JED-TEST-ssh systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 20 03:43:14 JED-TEST-ssh systemd[1]: Starting OpenBSD Secure Shell server...
Sep 20 03:43:14 JED-TEST-ssh systemd[1]: ssh.service: Found left-over process 6252 (sshd) in control group while starting unit. Ignoring.
Sep 20 03:43:14 JED-TEST-ssh systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 20 03:43:14 JED-TEST-ssh systemd[1]: ssh.service: Found left-over process 6272 (bash) in control group while starting unit. Ignoring.
Sep 20 03:43:14 JED-TEST-ssh systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 20 03:43:14 JED-TEST-ssh sshd[7238]: Server listening on 0.0.0.0 port 22.
Sep 20 03:43:14 JED-TEST-ssh sshd[7238]: Server listening on :: port 22.
Sep 20 03:43:14 JED-TEST-ssh systemd[1]: Started OpenBSD Secure Shell server.
root@JED-TEST-ssh ~# systemctl status ssh.socket
* ssh.socket - OpenBSD Secure Shell server socket
   Loaded: loaded (/lib/systemd/system/ssh.socket; disabled; vendor preset: enabled)
   Active: inactive (dead)
   Listen: [::]:22 (Stream)
 Accepted: 0; Connected: 0;
root@JED-TEST-ssh ~# systemctl list-unit-files ssh*
UNIT FILE    STATE   
ssh.service  enabled 
ssh@.service static  
sshd.service enabled 
ssh.socket   disabled

4 unit files listed.
root@JED-TEST-ssh ~# systemctl is-enabled ssh.service           
enabled
root@JED-TEST-ssh ~# systemctl is-enabled ssh.socket
disabled

Note that ssh.service is "enabled" and ssh.socket is "disabled"!?! I did notice that the ssh.socket notes "vendor preset: enabled" in it's status output. My reading suggests that that means that it would be enabled by default. However, I tried purging openssh-server ('apt purge -y openssh-server' - "purging" removes the software and it's config in /etc; "removing" just removes the software and leaves the config intact). After re-installing the above remains the same (actually, the above was taken after I re-installed openssh-server; but looks nearly identical - note the log noise from when I removed it whilst still logged in via SSH! :).

We do tweak the default SSH config a little (mostly hardening, but also allow password SSH login). But purging as I did above, should return the system to the Debian default.

Out of interest I stopped ssh.service and tried logging in, but that failed (I probably shouldn't be surprised...).

If one (or both) of you have the time and the energy, it'd be great if you could try the steps I note above on a clean local install of a TurnKey LXC container and report back. Also, if you could share the container config? TBH, I'm not sure it will help, but perhaps? FYI, here's my CT conf (node name: "pve" - VMID 113 - /etc/pve/nodes/pve/lxc/113.conf):

arch: amd64
cores: 1
hostname: JED-TEST-ssh.socket
memory: 512
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.1.1,hwaddr=AE:CD:D2:46:94:A1,ip=192.168.1.113/24,type=veth
ostype: debian
rootfs: local-lvm:vm-113-disk-0,size=8G
swap: 512
unprivileged: 1

So I still have no idea how the ssh.socket is even able to run on your servers? If I put that aside for a moment, then it appears that this behavior may be caused by a race condition between ssh.service and ssh.socket? Although I'm still not clear on how it even starts on your servers? It's definitely disabled OOTB for me!?!

Out of interest, I have spent a ton of time digging around online and it does appear that other OS are affected, although there appears to be very little Debian/Ubuntu specific/related discussion?!

Arch Linux have removed ssh.socket from their OpenSSH package altogether - due to denial of service concerns - note that they also have a warning about it on their wiki. Red Hat also has some similar issues noted in these two bug reports. The only Debian bug that I could find that seemed somewhat vaguely related was an old one from 2015 - and as I say, it's only vaguely related (it's about the "socket" not abiding by sshd_config - a somewhat related issue, but not directly relevant).

So I'm pretty stumped. Having said that, I'm almost inclined to mask ssh.socket by default. At least on our Proxmox container builds, if not all builds. I don't imagine that will do any harm anywhere. Using the ssh.socket is considered non-standard config, so the expectation is that the user would need to configure that themselves. Having to also unmask the socket doesn't seem like a completely unreasonable requirement if someone did want to use it.

Thoughts?

Richard's picture

I don't know how to confirm this, but my suspicion is that sshd.socket is only used as a fallback if sshd.service is not running.

And I suspect sshd.service crashes if the container is started and there is some kind of network error/conflict.

@Jeremy: you might like to try cloning an existing container (both have same IP), change the IP, start it, then see if sshd has crashed?

Jeremy Davis's picture

Thanks Richard. I'll give that a try and let you know.

Also, if you get a chance, I'd love it if you could test the v17.1 LXC builds that I've linked to in my reply to Chris below.

I'm particularly interested to hear how the patched builds go. Obviously I'd love to hear whether you experience this same issue but I'd also be particularly interested to hear how they perform in general (i.e. whether there is any other weirdness).

Chris's picture

Sorry I don't this is going to help!

I created a TKL Core 16.1.1 server then logged in using the console rather than SSH. As you can see below both ssh.service and ssh.socket are enabled but only ssh.socket is active. I suspect that systemd loads before ssh.service, so ssh.socket binds to port 22 before ssh.service tries to start. ssh.service then sees the port is already in use and fails.

I created two containers, first time I installed the security patches when prompted in confconsole. Then I wondered if something in the patches was breaking things so I created a second container where I didn't apply the patches. It made no difference though...both ssh.service and ssh.socket were enabled in both containers.

This is on Proxmox 7.2-7 so it's based on Debian 11.5 rather than Debian 10 for Proxmox and the TKL Core container. So I'm guessing it's related to a difference between Proxmox 6 and 7. I've seen some references to similar things on the Proxmox forum, e.g. [SOLVED] - SSH doesn't work as expected in LXC | Page 2 | Proxmox Support Forum but no-one so far has offered an explanation of why this is happening.

systemctl status ssh.service

* ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2022-09-20 13:16:27 UTC; 1min 47s ago
     Docs: man:sshd(8)
           man:sshd_config(5)
 Main PID: 263 (code=exited, status=255/EXCEPTION)
      CPU: 8ms

Sep 20 13:16:27 test2 systemd[1]: Starting OpenBSD Secure Shell server...
Sep 20 13:16:27 test2 sshd[263]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Sep 20 13:16:27 test2 sshd[263]: error: Bind to port 22 on :: failed: Address already in use.
Sep 20 13:16:27 test2 sshd[263]: fatal: Cannot bind any address.
Sep 20 13:16:27 test2 systemd[1]: ssh.service: Main process exited, code=exited, status=255/EXCEPTION
Sep 20 13:16:27 test2 systemd[1]: ssh.service: Failed with result 'exit-code'.
Sep 20 13:16:27 test2 systemd[1]: Failed to start OpenBSD Secure Shell server.
Sep 20 13:16:27 test2 systemd[1]: ssh.service: Consumed 8ms CPU time.

systemctl status ssh.socket

* ssh.socket - OpenBSD Secure Shell server socket
   Loaded: loaded (/lib/systemd/system/ssh.socket; enabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2022-09-20 13:16:27 UTC; 2min 15s ago
   Listen: [::]:22 (Stream)
 Accepted: 0; Connected: 0;

Sep 20 13:16:27 test2 systemd[1]: Listening on OpenBSD Secure Shell server socket.
Sep 20 13:16:27 test2 systemd[1]: ssh.socket: Succeeded.
Sep 20 13:16:27 test2 systemd[1]: Closed OpenBSD Secure Shell server socket.

systemctl list-unit-files ssh*

UNIT FILE    STATE
ssh.service  enabled
ssh@.service static
sshd.service enabled
ssh.socket   enabled

4 unit files listed.

systemctl is-enabled ssh.service

enabled

systemctl is-enabled ssh.socket

enabled

Container config...

arch: amd64
cores: 1
features: nesting=1
hostname: test2
memory: 2048
net0: name=eth0,bridge=vmbr1,firewall=1,gw=10.10.10.1,hwaddr=9A:05:13:41:26:66,ip=10.10.10.204/24,type=veth
ostype: debian
rootfs: local-zfs:subvol-211-disk-0,size=32G
swap: 2048
unprivileged: 1
Jeremy Davis's picture

Awesome! Thanks Chris. Although unfortunately, you are right - it doesn't really help much. Thanks also for sharing the link to relevant threads on the Proxmox forums. After a bit of reading over there, it seems to be a relatively common (albeit somewhat intermittent) issue that we've inherited from Debian.

And we're really close to publishing the new v17.x LXC containers for Proxmox. So what would be extremely useful would be if you could test our build(s) before we publish them.

We already have had users building v17.x containers themselves and testing and reports have been positive (hence why I was planning to release the containers). So the initial build (that I had already prepared for release - probably early next week) can be downloaded via these links (see important note at the bottom):

debian-11-turnkey-core_17.1-1_amd64.tar.gz
debian-11-turnkey-core_17.1-1_amd64.tar.gz.hash

However, thinking about this some more, I've tweaked the LXC container build process, to mask the ssh.socket at build time. So that should ensure that the issue can't occur. Hopefully there isn't any negative side effect?! I can't imagine that there would be, but I still don't understand how this issue can occur in the first place - so who knows!?

So here are some patched builds to test (again see important note below):

debian-11-turnkey-core-patched_17.1-1_amd64.tar.gz
debian-11-turnkey-core-patched_17.1-1_amd64.tar.gz.hash

Important note: These will only exist temporarily. Once published, these temporary "test" builds will be removed and "official" builds will be on the mirror. Also, please note that whilst I've uploaded the hash files, the hash file is not signed (as it will be when properly released - I don't have access to the signing key). Despite this, you can still use the SHAs (in the hash file) to confirm image integrity.

Chris's picture

I've created containers using both unpatched and patched versions. I can confirm that the unpatched version behaves the same way as the v16 build, i.e. both ssh.service and ssh.socket are enabled with ssh.service inactive on boot because ssh.socket grabs port 22 first.

The unpatched version behaves as you hoped. ssh.socket is masked and inactive, with ssh.service active on boot.

Both builds allow inbound ssh connections, i.e. using ssh.socket for the unpatched version and ssh.service for the patched build.

I've not been able to do more extensive testing yet because I need to plumb the test server into my proxy environment to get improved access to it. I wish the Proxmox console interface was better! For now though the patch seems to work and minimal testing hasn't thrown up any obvious side-effects of masking ssh.socket.

I can provide captures of the systemctl output from each server if you need it but I don't think it gives you any useful extra info.

Jeremy Davis's picture

Thanks Chris. That's really valuable feedback. I'll do some testing myself, but unless I hit any obvious issues, I think I'll run with that patch for the v17.x Proxmox LXC builds.

Thanks again and please do share any other feedback you have if/when you test a bit more.

Add new comment