khantroll's picture

Hi All! My Moodle server went down this morning, and when I logged in to the Turnkey Hub I receieved this error: 

  • The Role ARN is no longer valid, please verify it is setup correctly.

I'm not sure what happened, though I think I have an idea.

Does anyone know how to fix this? 

Forum: 
Jeremy Davis's picture

That error message is caused by the fact that the Hub can not connect to your AWS account. The fact that your server has also "disappeared" makes me inclined to suspect that there is something wrong with your AWS account - although it could be coincidental?!

FWIW, you can double check the Hub's AWS account page. In the "Amazon service connectivity checks" box, you will see a number of checks, my guess is that they will all be failing given the error message you've reported. In the top right of that box, you'll see a "Help" text link. If you click that, it should open a pop-over with some things to check. Regardless, I'll provide my thoughts on the likely cause below.

If/when this occurs on an existing Hub account that was previously working fine, there are a number of factors that may be the cause. The ones that seems most likely/possible are:

  • Your AWS account has been suspended (e.g. outstanding charges caused by a card with insufficient funds, expired or blocked card, etc.).
  • Your AWS account has been removed/deleted (by someone who has sufficient access/control over it).
  • Your AWS IAMs role and/or the policy that allows access to the Hub has been removed, renamed or otherwise edited so the Hub can't use the info it has to connect.

If you are the only one who has access to the AWS account and you are sure that you have done nothing in your AWS account to cause this, then billing issues are by far the most likely. Have a look in the AWS billing area. Even if that all appears ok on face value, then I'd personally still double check that there is a valid payment method selected as default on the AWS billing methods page.

For what it's worth, AWS will send our repeated warnings (via email) before they suspend your account. If it ends up that billing issues was the cause, I suggest that you make sure the AWS contact email is up to date and investigate why you may not have seen the AWS warning messages (e.g. perhaps getting tagged as spam?).

Also, if you have multiple AWS accounts, please make sure that you are logged into the correct one. To double check, you can cross reference the 12-digit number in the middle of the ARN (as displayed near the top of the Hub's AWS account page) against the AWS account ID number (displayed near the top of the AWS account profile page).

If the AWS billing seems ok, then you'll need to check the IAMs role (although note that the IAMs role changing would almost certainly not impact your server status in and of itself). If you used the default Hub role name ('turnkeyhub'), then you should find it here, otherwise, search amongst the IAMs roles. If you're unsure of the role name, check the ARN in the Hub, the role name will be the very last part, after the slash. I.e. by default the ARN will be "arn:aws:iam::123456789012:role/turnkeyhub" and the default role name is 'turnkeyhub' (and the 12 digit number is your AWS account ID).

If the role exists and all seems ok, then the next step is to check the policy attached to it. You'll find the recommended policy by clicking the question mark next to "Role ARN" on the Hub's AWS account access page.

If that too checks out, then I'm really not sure what else it could be?! I can only suggest that you recheck connectivity in the hope that it was just some AWS issue that has since been resolved. To recheck in the Hub, click the "Help" text link (on the AWS account access page - to the right of "Amazon service connectivity checks") and click the "recheck connectivity" button.

If they all still fail, then there is definitely something up with your AWS account, or at least the IAMs role! You could also double check the server from the EC2 area of the AWS console. Note that unlike the Hub, the AWS console is split into regions, so you'll need to match the region (from the dropdown towards the top right) with the region where your server is/was. If you don't recall which region it was running in, you can either do trial and error (i.e. check them all, one by one) of if the server still shows in the Hub (not sure if it will with the AWS connection broken?) check the region there.

Please let me know how you go.

khantroll's picture

Hi Jeremy, 

Per your suggestion, I have checked my billing information. There doesn't show to be any outstanding or past due bills. However, there also doesn't show any charge for on the card for April either. I'm not sure if that is because it didn't incur enough usage to generate a charge (though there is a paid invoice for it) or what I don't know, so that may be still be a suspect. I'll have to reach out to their billing department verify that information. 

I don't have any emails saying anything about suspension, and I am getting usage emails from them. 

Can you this is the content of my turnkey hub policy. It should be, as it has been working for more then six months, but as you said changes to the role or policy are a possible cause. I added a different policy for a different project, but it had nothing to do with this one and it was days before this problem.  

Can you verify that it is right? 

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:*",
                "route53:*",
                "route53domains:*",
                "cloudwatch:*"
            ],
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": "s3:ListAllMyBuckets",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::*"
        },
        {
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::tklbam-*"
        },
        {
            "Action": "sts:DecodeAuthorizationMessage",
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

 

 

Jeremy Davis's picture

Comparing your policy against the default, other than the indents (which shouldn't matter; what you've pasted has larger indent that the default) it looks right to me. FWIW, here's the default:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "ec2:*",
        "route53:*",
        "route53domains:*",
        "cloudwatch:*"
      ],
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Action": "s3:ListAllMyBuckets",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::*"
    },
    {
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::tklbam-*"
    },
    {
      "Action": "sts:DecodeAuthorizationMessage",
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

So if the ARN noted in the IAMs role in the AWS console, exactly matches the ARN that the Hub shows, and the policy is right then AFAICT it must be something up with your AWS account?!

Also did you see if you could find the server and investigate from that angle?

khantroll's picture

Yes sir, the ARN matches. 

When I go to the Turnkey Hub, all of the checks fail. When I click on servers, it tells me that my Amazon Account is no longer active.  When I click Re-Check, it says Access Denied. 

There is an account number listed under trust relationships. I'd presume it's Turnkey, but I'm hesitent to post it here to verify that. 

Is that accurate feedback, or is it telling me that because it isnt getting an answer? 

I can log in to my account just fine, and I have server charges from this month and it tells me that the billing cycling isn't over and there doesn't seem to be anything saying my account has been disabled. 

Do you have any idea where I might be able to verify that? 

I think I'm going to need to by a support plan and reach out Amazon. But that's going to be more time that this thing is down. I was hoping it was some kind of setting or something I could work through. 

Jeremy Davis's picture

So if the ARN matches, there must be something within your AWS account blocking the Hub's access! Essentially all the "recheck connectivity" button does is just checks to see if it can read from your AWS account. The fact that those checks are failing suggests that there is something blocking the Hub from accessing your account.

Re the "Trust relationships", the "Trusted entities" number should be "096457495696" (the AWS account the Hub uses to access user AWS accounts). The bit that does need to remain secret is the "ExternalId", but you can double check that by clicking the question mark in brackets "(?)"next to "Role ARN" on the Hub's AWS account access page and look for the box where it says "External ID".

Having said that though, none of that should have changed, unless someone has changed something within your AWS account.

What you note re billing certainly sounds like you haven't been suspended. It sounds like everything is up to date exactly as it should be. Everything else you've noted sounds like it should be working ok too.

I must admit I still find it weird that your server stopped at (around?) the same time though?! As I say, that may have been coincidental, but it's one hell of a coincidence. And the only thing that I can imagine would cause both to occur simultaneously is AWS account suspension - which we've already ruled out?!

TBH, other than AWS account suspension, I've never come across this before. No one else has reported any issues like this recently and everything within my own personal account is working fine. So whilst that isn't conclusive, it does suggest that it's something specific to your AWS account. If I check my "turnkeyhub" IAMs role I can see that the Hub accessed it a few minutes ago, I assume when I clicked the "recheck connectivity" button.

As a bit of an aisde, did you try to see if you could find the server in question within the AWS console? If you can, I suggest trying to start it and see what happens then.

khantroll's picture

Yeah, the trusted entity number matches. 

I think we may have a clue to the problem. There seem to be two servers on my account, created around the same time back in September. Neither are running. They as show as "stopped". But when I try to launch them, it takes me to "Choose an Amazon Image". 

Any idea why? 

khantroll's picture

So, I'm dumb. I was clicking "Launch Instance' instead of right clicking and choosing "start". The servers are started. They say running. That was the problem. Somehow, the servers got stopped. 

I have no idea why they would have stopped. I know I didn't do it. TBH, I had to figure out where they were in order to start them manually as I've never had to do anything with AWS since I started using this.

My only thought is that it had something to do with the weird billing in April. But if that is the case, why didn't it happen sooner?  

Jeremy Davis's picture

Yes it's very strange! At least your server is up and running! But I still don't understand why the Hub connection isn't working?! And it still seems incredibly coincidental that these 2 things occurred at the same time!

I'll certainly speak with my colleague Alon (the "Hub Daddy") and see if he has any ideas, but I'm pretty sure that we've gone over everything and it all checks out, so I'm really stumped.

Regardless, I think that it may be worth reaching out to AWS support (if you do it as a "billing" enquiry, they won't expect you to pay). I suggest noting that your server(s) were stopped without any intervention from you and the IAMs role no longer allows connection (again with no changes from you). Also note that you double checked the role, the policy and the credentials used to access the IAMs role and everything checks out.

khantroll's picture

I say that was the problem. I mean, the server is back up. Turnkey still isn't connecting to it. It does have a new public IP. Would that have something to do with it? 

Jeremy Davis's picture

When you stop an AWS server, it is disconnected from the physical hardware that it was running on (an AWS instance is essentially just a VM running on AWS hardware). Then when it's restarted, it will be allocated the next available hardware slot (so almost always on different physical hardware to what it was running on before). Unless the VM has an Elastic IP attached (essentially a static IP for an AWS instance) as it boots, it will be allocated a public IP via AWS DHCP.

Note that this means that a reboot actually has a slightly different "real world" effect to a stop/start cycle. A reboot retains the current hardware slot that your instance is using and does not acquire a fresh IP.

I'm almost certain that the issue with Hub connection is purely related to connecting to your account via the IAMs role. I.e. an issue that we've somehow missed and/or something else within AWS and/or your account blocking the connection. TBH, I'm still completely unclear how and/or why that has occurred and have run out of ideas and feel a bit stumped...

The only thing I can think of trying is perhaps manually removing the role and recreating it?! In theory that should make no difference, as it would only recreate what we've already confirmed should already be working. Although I guess it's a bit like "turning it off then turning it on again"?! :)

So long as the ARN remains the same and you use the details as noted within the Hub (as I've noted previously) then it should work. The ARN will be the same as what it is now so long as it's within the same AWS account and you give the role the same name (by default "turnkeyhub").

Personally, I'd probably leave that as a "last resort" though. I'd try getting in touch with AWS billing support and check with them if your account has been frozen at some point? Or perhaps if some other event has occurred that may have cause this.

FWIW occasionally AWS decommission hardware. When that happens all instances running on that hardware have to be stopped (and if you don't do it yourself, they will). So there is a possibility that that may have occurred, but you should have numerous notifications from them (via email). They wouldn't have done that intentionally without notice and the chances of that affecting 2 instances at the same time seems incredibly low!

Bottom line, it remains a mystery...

Add new comment