Hello,

The canvas AMI installtion continues to fail for us. We get ngnix errors when the system tries to install the certificate. Can we please be advised about how to resolve this?

Also if exiting during the installation process is there a way to resume the installation?

Thanks.

-Chris

Forum: 
Jeremy Davis's picture

Thanks for taking the time to report your issue.

To answer your last question first, assuming you meant the questions that you are asked on first SSH login; yes you can rerun those like this:

sudo turnkey-init

Regarding your issue, when you say that you "get ngnix errors when the system tries to install the certificate" I'm a bit confused and not clear exactly what you mean?!

For starters, we use Apache as the webserver (not Nginx). Canvas is run via Passenger as an Apache module (as recommended by Canvas). So I'll just assume that you meant Apache, but please clarify if that's not the case.

Regarding the certificate itself, is this one you've already got that you're trying to upload? If so, can you please tell me a bit more about it, what format it is, etc (certificates come in a range of different formats and even within formats can be constructed differently). If you're not completely sure, even if you can tell me the names of the files (inc their file extension) that you were trying to load then that might give me a hint at what they are and what might be required to use them.

Alternatively, if you're trying to get a free certificate via Let's Encrypt, are you using our built in tool? Or something else? If you aren't using Confconsole (start via 'sudo confconsole') to get a Let's Encrypt cert, then could you please be more explicit about what you've tried and how you did that.

Also could you be more explicit about the error message(s) you are getting and where you are seeing it/them. I.e. sharing the verbatim error message can be really useful. Also, does it occur in your browser? On the commandline? In the logs?


FWIW I've tested generating a Let's Encrypt certificate on an AWSMP TurnKey Canvas server using our built in tools and it "just works" for me. Let me step you through what I did:

I started the server and configured my desired DNS name to point to the IP address of my server (note that if I wasn't just testing, I'd attach an elastic IP to my server before doing that). I logged in via SSH (using my domain - to double check that the DNS was working as it should). I ran through the initialisation hooks and set the domain as per my custom domain/DNS name. I double checked in my browser that everything worked as expected (it did). I then used Confconsole (a built in tool we provide; can be launched via 'sudo console') to get a Let's Encrypt certificate. That appeared to succeed, so I double checked in my browser specifying "https" and yes; working as expected. (If I wasn't just testing, I'd then enable auto cert renewal within Confconsole too).

Hello,

Thanks, yes 'turnkey-init' got me back to the screen I wanted to configure the set up.

The 'confconsole' is what I'm using to try to get the certificate. Here's the full error message:

[2020-07-28 01:08:30] dehydrated-wrapper: INFO: started
[2020-07-28 01:08:30] dehydrated-wrapper: INFO: found nginx listening on port 80
[2020-07-28 01:08:30] dehydrated-wrapper: INFO: stopping nginx
Failed to stop nginx.service: Unit nginx.service not loaded.
[2020-07-28 01:08:31] dehydrated-wrapper: INFO: running dehydrated
ERROR: Challenge is invalid! (returned: invalid) (result: {
  "type": "http-01",
  "status": "invalid",
  "error": {
    "type": "urn:ietf:params:acme:error:unauthorized",
    "detail": "Invalid response from http://canvas.example.com/.well-known/acme-challenge/TkDoA-02wZ_vCrfpUnD... [WAN_IP_HERE]: \"\u003c!DOCTYPE html\u003e\\n\u003chtml class=\\\"scripts-not-loaded\\\" dir=\\\"ltr\\\"   lang=\\\"en\\\"\u003e\\n\u003chead\u003e\\n  \u003cmeta charset=\\\"utf-8\\\"\u003e\\n  \u003clink rel=\\\"preconnect\\\"\"",
    "status": 403
  },
  "url": "https://acme-v02.api.letsencrypt.org/acme/chall-v3/6154046908/_WgSeg",
  "token": "TkDoA-02wZ_vCrfpUnDOtWSjgRLdRHNkI30gdOOZ0MA",
  "validationRecord": [
    {
      "url": "http://canvas.example.com/.well-known/acme-challenge/TkDoA-02wZ_vCrfpUnD...,
      "hostname": "canvas.example.com",
      "port": "80",
      "addressesResolved": [
        "WAN_IP_HERE"
      ],
      "addressUsed": "WAN_IP_HERE"
    }
  ]
})
[2020-07-28 01:08:38] dehydrated-wrapper: WARNING: Something went wrong, restoring original cert, key and combined files.
[2020-07-28 01:08:38] dehydrated-wrapper: INFO: (Re)starting nginx
[2020-07-28 01:08:38] dehydrated-wrapper: INFO: (Re)starting stunnel4@shellinabox.service
[2020-07-28 01:08:38] dehydrated-wrapper: INFO: (Re)starting stunnel4@webmin.service
[2020-07-28 01:08:38] dehydrated-wrapper: WARNING: Check today's previous log entries for details of error.

If anything else is needed for clarification please let me know.

Also, after running the 'turnkey-init' the 'admin' password I set up doesn't let me in at 'http://canvas.example.com/login/canvas'. Should that be functional at this point or is the cert required?

Thanks.

-Chris

Jeremy Davis's picture

Glad to hear that 'turnkey-init' got you going again. Thanks too for the additional info.

Hmm, yeah that output clearly suggests that it's finding Nginx when checking what webserver is using port 80 (the standard http port). That's very weird and worthy of further investigation. But before we go there, it also appears that your server is giving a 403 error when Let's Encrypt servers are trying to connect. Let me unpack that a little further first. The custom simple server that we provide to server the challenges should not ever return a 403, so that's really weird too!

The log notes that the domain is "canvas.example.com". Is that actually what you've set (i.e. one that you don't own and doesn't redirect to your server)? Or did you change the output so as not to publicly show the domain when you posted here? If your server is configured with "canvas.example.com" then that will never work as the process requires that your domain points to your server (the URL that it checks is dynamically generated by Let's Encrypt on the fly). If you share your domain and/or public IP, I can have a closer look from here and/or double check DNS records are appropriately set. There is no real reason not to publish the IP and/or domain but if you are uncomfortable with that please email it to support AT turnkeylinux.org (and note that it's related to this thread).

If you are not actually using "canvas.example.com" (and are instead using an actual domain that you control), another thing to double check is whether you are using some sort of CDN (e.g. Cloudflare, AWS Cloudfront, etc). Often having your server fronted via CDN will cause issues. You are often better off leveraging the certificate functionality of your CDN instead. Again though, sharing the info with me will allow me to investigate a little more and perhaps I might be able to see something from here?

To circle back to the Nginx bit, could you please give me the output of the following commands:

turnkey-version

(Will give me the exact TurnKey version that you are using)

sudo netstat -tlnp

(Will show me what ports are in use and which application is using them)

Also please share the status of the Nginx service (it shouldn't exist, so if this throw an error, that's fine; but let's double check - even if it throws an error, please post that anyway).

sudo systemctl status nginx
[edit - fixed typo; it should be "systemctl"...]
Jeremy Davis's picture

Sorry missed your last questions...

FWIW on AWS Marketplace our servers use a Linux user called "admin". So the "admin" password is for logins that use the Linux user, i.e. SSH and Webmin. So make sure it's a really good one!

The password that you set for Canvas should be used in conjunction with the email address that you set to log into the Canvas admin account via the WebUI.

Jeremy Davis's picture

Chris wrote:

The server version is running:
turnkey-canvas-16.0-buster-amd64
The open ports come back as:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address    Foreign Address  State   PID/Program name
tcp        0      0 127.0.0.1:6379   0.0.0.0:*        LISTEN  8393/redis-server 1
tcp        0      0 0.0.0.0:45263    0.0.0.0:*        LISTEN  26177/Passenger App
tcp        0      0 127.0.0.1:10000  0.0.0.0:*        LISTEN  26195/perl
tcp        0      0 0.0.0.0:80       0.0.0.0:*        LISTEN  24755/nginx: master
tcp        0      0 0.0.0.0:22       0.0.0.0:*        LISTEN  2227/sshd
tcp        0      0 127.0.0.1:5432   0.0.0.0:*        LISTEN  1057/postgres
tcp        0      0 127.0.0.1:9977   0.0.0.0:*        LISTEN  26652/python3
tcp        0      0 127.0.0.1:25     0.0.0.0:*        LISTEN  1176/master
tcp        0      0 0.0.0.0:443      0.0.0.0:*        LISTEN  24755/nginx: master
tcp        0      0 127.0.0.1:12319  0.0.0.0:*        LISTEN  26104/shellinaboxd
tcp        0      0 0.0.0.0:12320    0.0.0.0:*        LISTEN  26065/stunnel4
tcp        0      0 0.0.0.0:12321    0.0.0.0:*        LISTEN  26089/stunnel4
tcp6       0      0 ::1:6379         :::*             LISTEN  8393/redis-server 1
tcp6       0      0 :::22            :::*             LISTEN  2227/sshd
I can't find the 'nginx' listed as a service. The 'symtemctl status nginx' brings back:
bash: symtemctl: command not found
using 'service nginx status' doesn't find a matching service:
Unit nginx.service could not be found.
I'm guessing it is running under one of the other services
[Chris then listed the other services he had running]

First up, apologies on my typo. I've fixed it in my above post, but the command should have been "systemctl" - not "symtemctl" - Doh!

Regardless, the additional info has given me enough to work with and I'm pretty sure that I've worked out the issue!

So essentially, what I think is happening is a "race condition" between Nginx and Apache. We intend for Canvas to run (via the Rails application server; Passenger) on Apache (i.e. via mod_passenger). However, it appears that there is a redundant config still lurking that tries to run Passenger in "standalone" mode (which actually uses Nginx under the covers).

I'm not really sure why it's never caused us any issues during testing? I suspect that part of the issue is that on face value, everything appears to be working fine (the log in page loads, log in works, etc). It's only once you try to get a Let's Encrypt cert or use the API that the issue would start to show (and only if Nginx was running). Others have likely hit this too but because it's intermittent (and we've never hit it ourselves under the circumstance where we'd notice) we've never been completely sure what might be going on (after all, we don't even install Nginx! Turns out that it's part of the source install of Passenger that we use).

Anyway, this is still theoretical (although I'm fairly confident). To confirm I'm right, please try the following:

sudo systemctl stop passenger
sudo systemctl restart apache2

Then double check that your Canvas instance is still running (via your web browser). The initial page load may take a little while, but Canvas should continue to work fine.

Assuming that everything is still ok, then please disable the (standalone) "passenger" service; like this:

sudo systemctl disable passenger

Assuming the above all works as I expect, Let's Encrypt should now work fine. Please give it a go and let me know.

[edit] I've just posted a bug report regarding this. We will aim to publish a new appliance including this fix within the next week or so.

Add new comment