Jeremy Davis's picture

This is a thread to discuss the (ongoing) work of developing a new install method for the next (v15.2) release of the TurnKey GitLab appliance. It is a continuation of the discussion (inadvertently) kicked off by Royce by this comment. FWIW, there is also a related open feature request regarding this on our tracker.

Summary of things so far:

  • There are maintenance issues with the current GitLab "source install" method (both for TurnKey and for end users).
  • As such, following discussions (both internal and public) it has been decided the best path forward is to use the GitLab Omnibus install for future releases (v15.2 onwards) of the GitLab appliance.
  • Jeremy (i.e. me) will do the preliminary work to develop a "base" GitLab appliance which has GitLab installed via Omnibus package.
  • Tim and Royce (and anyone else who is interested) will assist with testing and/or further development (via feedback and/or code and/or discussion, etc).
  • Tim and Royce (and others?) will also assist with testing migration from recent versions of TurnKey GitLab (i.e. =
  • Once we have a working product that at least matches the functionality of the current GitLab appliance, plus some base documentation on how to migrate, etc, we'll release it as the next (v15.2) GitLab appliance release.

We'll then look at further improvements to make it (much) better than the current appliance. That will include easy config to:

  • Configure GitLab settings to support AWS as external data-store for large data items (i.e. git-lfs using AWS S3 as backing).
  • Confiure GitLab settings for Mattermost connectivity.
  • Configure SSL Certificates (TKL already provides basic server-wide Let's Encrypt TLS certs - but we'll make sure it fulfills the needs of GitLab, including any sub-domains etc).
  • Configure a RAM-disk swapfile.

Anything I've missed guys?!

Let the discussion continue... :)

Forum: 
OnePressTech's picture

Here is a good VM config wizard sample doc from GitLab:

https://docs.gitlab.com/ce/install/aws/

This article has two interesting reference points:

1) How to create a high-availability GitLab configuration on AWS

2) How to configure the AWS GitLab AMI

These are both useful references for mapping out the TKLX AMI configuration wizard steps.

There is also the Bitnami GitLab documentation:

https://docs.bitnami.com/general/apps/gitlab/

 

NOTE:

For those reading this who wonder why bother creating a TKLX GitLab VM when there is already a GitLab AMI or a Bitnami GitLab VM...

1)  Supporting additional VM formats

2) TKLBAM incremental backup

3) Debian security updates

4) VM tools to manage the VM

5) TKLDev to manage variations

6) The TKLX community

7) Jeremy Davis (what can I say Jed...you are a big reason I am still with TKLX :-)

 

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

Your encouragement and kind words are warmly welcomed! :) Also those resources look like they'll be handy once we get to that bit.

Unfortunately, I don't have much to report yet (other than I REALLY hate Omnibus now I know more about it! Bloody Chef and Ruby and ... Argh!). I've already spent hours and can't yet get GitLab to install to the build environment (chroot). I keep thinking I'm close, but no cigar yet... :(

The more time I spend with web apps written in Ruby, the more I wonder why the hell anybody in their right mind would do that!? Anyway, that's enough whinging... Hopefully, I'll have a breakthrough soon and have something to share...

Jeremy Davis's picture

Ok, so there's still plenty of work left to do, but the major hurdle of getting the GitLab Omnibus package to install appears to be complete (some tidying up left to do, but it appears to install ok...).

I currently have it building to ISO so I can do some more development. FWIW SystemD doesn't work properly within a chroot, so building to an ISO (then installing to a VM) makes everything a bit more "real world".

I don't plan to do a ton of battle testing yet. I'll do just enough to make sure that everything appears to work. Then I'll test a few ideas I have for regenerating secrets. Plus make sure that we can set the first (admin) user email and password etc.

If you're at all interested, I have pushed the build code that I have so far (definitely not ready for production!) to a new branch on my personal GitHub. If you do wish to build yourself on TKLDev it should "just work". Essentially though, at this point it's just a vanilla GitLab-CE Omnibus install (no GitLab related inithooks, etc).

If you plan to do any development and/or build multiple times, I recommend that you run make root.patched initially, then copy out the deb package from the apt cache. E.g. from scratch in TKLDev:

cd products
git clone  -b omnibus-pkg-install https://github.com/JedMeister/gitlab.git
cd gitlab
make root.patched
APT_CACHE=var/cache/apt/archives
cp build/root.patched/$APT_CACHE/gitlab-ce_*.deb overlay/$APT_CACHE/

Not having to download the ~450MB omnibus package each rebuild will certainly make things a bit quicker! Although please note that it's still quite slow. In part because the install takes a while anyway (which won't change). But in part because I'm currently committing large areas of the filesystem to git repos to see exactly what is going on! That will be removed before the final build.

If you just want to build an ISO (or other builds) then you can just cd to products dir and do the git clone first (as per above), then skip the rest and use buildtasks to build the ISO. Once you have an ISO built, you can build the other builds. I.e.:

cd buildtasks
./bt-iso gitlab
# when done you should have the following file:
# /mnt/isos/turnkey-gitlab-15.2-stretch-amd64.iso
# then to build an alternate build, such as Proxmo/LXC
./bt-container gitlab-15.2-stretch-amd64
# or OVA/VMDK:
./bt-vm gitlab-15.2-stretch-amd64

As soon as I'm a bit further along, I'll upload an ISO and a Proxmox/LXC build for you guys to test. This is mostly just so you can see some progress and have a bit of a play if you're super keen.

OnePressTech's picture

Nice work Jed. Disappointingly on my end I will need to wait 2 weeks while our intrepid national Telcos try to get me on NBN. After a week of Optus stuffing around unsuccessfullly I am running on expensive mobile data in the interim so big downloads are out for now. I'm in the process of trying to switch to Telstra. Fingers crossed that they can actually get me a working service any time soon. Telstra just announced more technical jobs being shipped to India. Apparently there are no available qualified Telecom people in Australia. Huh...I'm a ex-Telstra qualified Telco engineer...no one called me!

So I am expecting another two weeks of delay before I can test the new VM.

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

For fear of getting a bit too political here, seriously the NBN (explicitly MTM/FTTN) has been the most massive Government fail! I'm one of the lucky ones who got "proper" NBN (FTTH/FTTP) and it's been pretty good for me since I got connected (a couple of years ago now) but I've heard plenty of horror stories.

Bit of a pity you won't get a chance to test much within the next couple of weeks. But if I can get it to a point where I'm pretty happy with it before then, I'll see what I can do getting you access to an AMI.

OTOH, there is also a possibility that I'm happy enough within the next 2 weeks that I may just publish it. But there is nothing stopping you from letting me know about any shortcomings, bugs or improvements we can make. If they're critical bugs, we can re-release ASAP. If they're not so critical, we can bundle them into the next release (which we should be able to do within a month).

One major benefit of the install via Omnibus package will be that generally it will be a breeze to rebuild with an updated version of GitLab! :)

Good luck with your networking woes. Hopefully Telstra comes to the party!

Jeremy Davis's picture

Just a quick update.

The install of GitLab appears to work ok and everything seems to be running as it should. However, the initial interactive inithooks that I have devised do not work as they should. The WebUI still requests that you set a password when you first access it. The email/password set by the inithook doesn't allow login and the password set via WebUI also fails... :(

So lots more work left to do, but definitely progress. FWIW I've pushed my changes back to the WIP GitHub repo (in my personal GH account).

Sytko's picture

Hello. When can I download a ready iso for testing?
Jeremy Davis's picture

Hi there and thanks for your interest and willingness to be involved. I really appreciate it.

Unfortunately, I've had a few other things I've needed to prioritise this week, but I have made a little more progress since last time I posted. I think it's pretty close to being ready for further testing, but it seems unlikely that I'll get there today (so may not be until next week).

One thing that would be incredibly helpful would be to get an understanding of what areas of the filesystem might need to be included in a backup? Anyone who is already using the Omnibus install (Tim? Royce?) might already have some ideas!? It's something I haven't looked at yet at all, but will need to before we do an official release. Any pointers that might save me time would be really appreciated.

Otherwise, if you'd like to build your own ISO using TKLDev, then that's an option (so you don't need to wait for me if you're super keen). There is a doc page which should help get you up to speed. To build the "new" GitLab (dev) appliance, when you get to building your second iso (it's recommended to just build Core first to ensure that you understand what is going on and ensure everything is working properly), then you can use the 'omnibus-pkg-install' branch of my fork of the gitlab repo. I.e.:

cd products
git clone --branch omnibus-pkg-install https://github.com/JedMeister/gitlab.git
cd gitlab
make

(Please also see my notes in this post above).

I have been testing the code (committed as of late last week) and have just committed and pushed the (minor) only changes that I have made to the code since I built the instance I've been testing. It's also worth noting, that I plan to remove the "schema" setup from the inithook to a confconsole plugin, as it triggers the Let's Encrypt cert generation. IMO it doesn't really make sense to do it at firstboot as it's unlikely that most users will have everything in place at that point and choosing to generate Let's Encrypt certs without DNS pre-configured will cause failure.

At this point, I doubt I will get any further this week, but hope to have something ready for others to test early next week.

OnePressTech's picture

Hey Jed,

Sorry I'm not being much help. I have extracted myself finally from Optus but am now on the Telstra joyride. We'll see how that goes.

I expect to be back on this next week.

Regarding Backup, are we trying to identify volatile / non-volatile folders / files to know which should / should not be backed up by TKLBAM?

GitLab backup may provide some helpful guidance:

https://docs.gitlab.com/ce/raketasks/backup_restore.html

IDEA:

Since we are already following an unconventional journey with this particular VM by installing via omnibus rather than source...why stop there :-)

We could just create a local folder on the GitLab VM, configure / trigger a GitLab backup that dumps everything into that folder, and configure TKLBam to just back up that backup folder and consider everything else as poart of the mirrored image.

So on a restore, we do the opposite, TKLBAM restores the backup folder on a freshly created GitLab VM and then a GitLab restore is triggered. I assume we would not touch TKLBAM so it would probably be a cron script that triggers a restore if the folder date changes...or something like that.

Thoughts?

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

I'm appreciative of any input I get! So it's all good. Plus I totally understand the telco pain...

Regarding Backup, are we trying to identify volatile / non-volatile folders / files to know which should / should not be backed up by TKLBAM?

Exactly! :)

GitLab backup may provide some helpful guidance: https://docs.gitlab.com/ce/raketasks/backup_restore.html

Fantastic, thanks! That will help tons!

We could just create a local folder on the GitLab VM, configure / trigger a GitLab backup that dumps everything into that folder, and configure TKLBam to just back up that backup folder and consider everything else as poart of the mirrored image.

Hmm, that sounds like a very reasonable suggestion. We'd then look to avoid backing up all/most of the rest of GitLab. Although FWIW generally anything included in an apt package (with the exception of the config in /etc and sometimes data within /var), would not be desirable to include anyway.

FWIW, it appears that re GitLab, much (if not all?) of the data in /var (/var/opt/gitlab) is generated by the 'gitlab-ctl reconfigure' command (generated from settings in the /etc/gitlab/gitlab.rb).

Regarding triggering the GitLab backup (and restore), assuming that i can be done via commandline, TKLBAM hooks could easily cope with that!

My only concern regarding your idea would be; what happens if the GitLab version that that creates the backup, is different to the version it is being restored to? I have no idea how the Omnibus package might cope with that?!

And between GitLab having a rapid release cycle and the ease of update that the Omnibus package will provide, there is a very real chance that the data will be for an alternate version of Gitab than what it ws created from. FWIW, the current backup somewhat works around that, but including GitLab itself (which is a double edged sword as it requires the user to manually keep the install up to date, and also makes it likely that it won't "just work" when migrating between major versions).

I guess we could consider a restore hook to try to match the installed version with the backup version. But the more stuff scripted during restore, the more factors need to be considered, and the greater the risk of something breaking...

Note though, that this isn't really a specific concern related to your suggestion. It would apply to many other appliances and a "normal" TKLBAM style backup too. We have a few other appliances that have 3rd party repos. But the fact that the GitLab Omnibus includes so much software, I anticipate that the implications would be significantly larger and due to the shear volume of software installed, significantly more likely for issues to occur.

Having said that, TKLBAM's primary aim is to provide reliable backup and restore for a specific single server. Usage as a data migration tool, is certainly in scope, but is secondary to the primary objective. Plus is not guaranteed without the potential need for manual intervention. Still I think it requires some thought.

The other thing that occurs to me is that ideally we would want an uncompressed backup, stored within a consistent place (e.g. no date stamped directories). Otherwise the daily diffs (i.e. daily incremental backups) would potentially be ridiculously large. I am unsure whether the GitLab backup tool would support that, but I'll have a look sometime soon.

Regardless, awesome suggestion and definitely food for thought! :)

OnePressTech's picture

Thanks for everything Jed.

Regarding cross-version backup / restore or stale restores, this is always an issue.

To mitigate the risk the GitLab backup could be set on a daily schedule so that the backup TKLBAM stores is always current.

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

If we run the backup as a pre-backup hook, then the backup should always be up to date. Although we need to consider how we might handle things, if say the GItLab backup fails. I assume the best course would be for the TKLBAM backup to also fail (ideally loudly) in that case.

Anther consideration that has just occurred to me too, is that by default TKLBAM will possibly want to backup the (Postgres) DB. I assume that a GitLab backup would include the relevant DB data? If so, we won't want the DB being included separately as well. TBH I'm not sure how TKLBAM determines which DB is in use? We might be lucky and because it's not in the default place (i.e. installed by Omnibus) it doesn't even try to back it up. Worst case though, we could explicitly exclude it.

PS, it doesn't look like I'm going to get any more time to play with it today, but so far the basic install and firstboot seems fairly reliable. The inithooks are still disabled by default, but the interactive one I've been working on appears to work well. And I've tested just manually removing the secrets (from /var/opt/gitlab/)and (re)running 'gitlab-ctl reconfigure' seems quite happy to regenerate them. So that looks like it will work fine. I think I mentioned that the inithooks currently give the option to do the Let's Encrypt at first boot, but I plan to move that out to Confconsole I reckon (unless you have a good reason not to).

OnePressTech's picture

If TKLBAM triggers a GitLab backup before it does its diffs that would work.

LetsEncrypt from the console is a good plan. Some DevOps may want to use their own cert rather than LetsEncrypt. Cert-upload should eventually be added as an option in confconsole so the DevOps supplied cert can be auto-installed as part of the installation process.

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

Awesome, sounds like we're on the same page then. That's great.

Unfortunately, it looks like I have another (programming related) task that I'm going to need to prioritise early next week. Whilst that should be quite fun, I was hoping to get GitLab wrapped up (at least the initial "rc" of the rebuilt appliance) early next week. It seems likely that that will not be a realistic goal now and probably mid-week will be the earliest. Regardless, I'll certainly see what I can do. TBH, it's really close I reckon.

Take care mate and chat more soon.

Sytko's picture

Hi. According to the instructions above - iso created. I will test further. The question is how to further update the virtual machine to the new version of gitlab?
Jeremy Davis's picture

Thanks for helping with testing! :)

When you build the iso, it should pre-install the latest version of GitLab-CE by default. From then on, you can follow the GitLab documentation on upgrading (use the instructions for Debian).

If you update regularly, then updating GitLab to the latest version should be as simple as running this:

apt update
apt install gitlab-ce

The only additional thing that you may need to do is update the config file (/etc/gitlab/gitlab.rb). I.e add/update any options that have changed for the new version.

However, it's important to note that if you fall behind the upstream updates, you will need to update to the latest release of the major version you are on, before updating the new major version. GitLab versioning is explained on this page. Please specifically see the section on upgrade recommendations . Please note that page is for GitLab-EE but also applies to GitLab-CE. (links updated to point to CE docs)

OnePressTech's picture

Gitlab docos are symmetrical...just change /ee/ to /ce/ in url

So in Jed's email above for CE the upgrade recommendations for CE is at:

https://docs.gitlab.com/ce/policy/maintenance.html#upgrade-recommendations

AND for EE is at:

https://docs.gitlab.com/ee/policy/maintenance.html#upgrade-recommendations

JED: Please adjust your post...it is not always true that EE instructions apply to CE.

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

Thanks mate. I'll update my post above. I just followed the links that generic GitLab docs provided. They obviously favour the EE docs when they need to provide links from generic docs to specific docs (which makes sense. I'll keep your point in mind for future links! :)

Jeremy Davis's picture

I have made a bit of a start on migration documentation. But it didn't take very long for me to start hitting details that required further elaboration and consideration. It's already chewed up fair bit of time and there is no end in sight...

So I've had to put that aside for now. I won't bother posting what I have yet as it's incomplete and a little contradictory (as I discovered more details). But I already have some quite useful insight into how it might work under different scenarios.

If anyone wants pointers on how I anticipate migration from a version of our existing appliance (or any source install for that matter), to the upcoming (Omnibus install) GitLab appliance might work, please ask. To provide relevant and specific thoughts, I'll need info on the TurnKey version (or Debian/Ubuntu version if not TurnKey) that you currently have and the version of GitLab that you are running on it.

I'm almost certain that providing info and pointers for a specific scenario will be much easier that needing to consider all possibilities...!

OnePressTech's picture

I agree Jed. Though I am sure TKLX clients would all love fully life-cycled VMs, that is a huge  investment that no VM supplier (Bitnami, AWS, Google, Azure, etc) has committed to. TKLX provides shrink-wrapped VMs...it is up to the VM users to life-cycle them. This is the same no matter who people source their VMs from.

So other than some basic guidance on migrating from old GitLab VM to new GitLab VM I think your approach makes sense. Keep it simple and let the community backfill :-)

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

After a few sidetracks, I finally swung my attention back to GitLab today. And I have good news to report!

I have inithooks and tklbam working (at least in theory). I have tested most of my code on a running dev server, but not from a fresh install from ISO. So there are almost certainly going to be bugs (I'm hoping just typos and other easily fixed minor flaws). I just built a fresh ISO from my build code and have pushed my latest updates back to my repo (as noted above within this thread).

I'm knocking off for the day, so won't get any further now. If anyone gets a chance to test it out in the meantime, I'd love some feedback. Please post here with any issues you strike. If/when I hit any bugs when testing tomorrow, I'll post back here too so we are all on the same page.

Once I have done a bit more testing, and ironed out any of the above mentioned bugs (that I'm sure will exist), I think we'll be ready for proper "battle-testing". I also need to write up a bit of documentation and I'll clean up the commit history before I push to the proper TurnKey repos. But all-in-all I'm really happy with the progress.

I don't want to jinx myself, but I'm thinking we may be able to publish next week. Even if I can't get any of you good people to assist initially, if I can confirm that everything works (e.g. inithooks and a backup from one server and restore to another) then I might even push ahead with publishing v15.2. If we strike any new bugs I miss, then no reason why we can't do another release soon after if need be...

OnePressTech's picture

Cheers Jed for all your hard work. In a few days I will be able to do some testing and am available to work with you on documentation as well.

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

First minor flaw discovered and fixed! (I forget to re-enable the gitlab inithooks - doh!)

I'm now looking at improving the inithook (password setting component) as it displays the GitLab password set by the user in plain text (which I really don't like for obvious reasons). I'm not 100% sure but I think it should be pretty easy...

I also haven't had a chance to test the backup. At this point, it will fail loudly if the GitLab version is different to the version backed up. I have an idea on how we might automate that, but I'm not sure how reliable it will be (and won't work for backups of previous TurnKey source installed appliances). Once I've had a chance to do some basic testing, I'll upload the tklbam profile somewhere to make it easy to test.

Jeremy Davis's picture

I'm pretty happy with everything so far, but logging in after firstboot is problematic... It appears that something isn't running as it should at first boot. But unfortunately, I'm struggling to debug it.

If I try to log in with the credentials I supply interactively on firstboot (via the inithook I have created) then at best I get "Invalid Login or password.", but I've also been getting random 422 and 500 errors too! And nothing remotely useful from the GitLab logs - at least not that I can find... The 422 errors seem to be related to CRLF tokens (to avoid XSS attacks) and the 500 errors are usually something up with GitLab backend (at least that was the case with our old source install). Unfortunately, the only place where anything regarding these errors appears to be being logged in (Omnibus) Nginx logs and it's essentially just giving me the same info as the webpage (i.e. the error number). Not very helpful!

The weirdest thing is that if I manually reinvoke the inithook (entering exactly the same info that I did the first time), everything works as it should and I can log straight in! And that is the case whether I re-run them interactively or non-interactively. It only seems to be on firstboot that it doesn't work.

I'm almost certain that it's either something not ready at the point of the firstboot when I'm trying to do stuff, or a race condition/ Although I haven't been able to find anything meaningful in the logs, so I'm only guessing.

I'm pretty sure it's not explicitly anything wrong with GitLab, but I can't help but get frustrated with it! Anyway, I do have some ideas on how to force it to log the output on firstboot (inithooks really should have better logging options IMO), so will try that and then I'll probably knock off for the day.

If anyone wants to test backups, you'll want the tklbam hook script and conf file from the buildcode overlay (they go in /etc/tklbam/hooks.d/) and the profile, which I've just (temporarily) uploaded to the repo as well (here - if the link doesn't work, then please let me know, although I will likely delete it at some point soonish). Note that I have not properly tested the hook script, or the tklbam profile, so DO NOT use either on a production server!!! It's unlikely to cause any damage, but I can't guarantee it. It's also likely to not work...

Jeremy Davis's picture

I just wanted to note that I've worked through the issue that stumped me late last week. I think I'm also overdue for a progress report anyway....

So it seems that I was a bit quick to lambaste GitLab on this occasion... (What really...?!) :)

Generally, the issue was an intersection of SystemD, GitLab's default systemd service/unit file, and the fact that the inithooks run really early in the boot process. Whilst a bit of a pain, it has allowed me to get a deeper understanding of SystemD and GitLab too.

The issue specifically was that GitLab wasn't yet running when the inithooks were trying to configure it, hence why it failed miserably on firstboot, but worked consistently later on.

I've worked around that by providing a temporary inithooks specific firstboot GitLab systemd unit file, so it starts early when required. The new service is then stopped once the config is complete. The "proper" GitLab service then starts as per usual, at it's allotted point within the boot process.

FWIW, it might make booting marginally quicker to adjust the default GitLab service file to consistently start earlier (it seems to run fine, even really early in the boot process), however as that is part of the Omnibus package, fiddling with that seems a bad idea...

I have been pushing all my updates to my repo. I make a habit of doing that every day after I've worked on it, so feel free to track my progress more closely via the commits log. As you can see, my commits are quite atomic, so it probably provides pretty good indication of what I've been up to... Once I am happy with it all, I'll rewrite/squash the commit history before merging into the TurnKey repo (lots of the commits are things I've tried then backed out of, or changed direction on - so the complete commit histroy is of limited value IMO).

FWIW I'm now continuing work on the TKLBAM config. Leveraging the GitLab backup mechanism makes the most sense really, especially considering that TKLBAM doesn't appear to see the GitLab Postgres installation. I have the basics working ok, but trying to get it all to work nicely within the limitations of GitLab backup and TKLBAM is quite complex really. Much more complex than I initially anticipated. But I am having some wins so far. I think that I'm also trying to do a bit too much with the backup really... But I just can't help myself! :)

I still need to get onto writing up some docs as there will be a lot of nuance to the TKLBAM requirements I suspect.

OnePressTech's picture

Sounds good Jed. I will be starting to play with it over the weekend. I look forward to seeing your handiwork up close :-)

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

I was hoping to get a bit further today, but at this stage it looks unlikely...

Hopefully you'll get a chance to check it out over the weekend. Please let me know how it goes and any thoughts you have (good, bad or otherwise).

FWIW as the tklbam hook has grown (in size and complexity) I have been considering whether I should persevere with the bash hook I have, or whether it might be better to rewrite it in python?! If you have any thoughts at all on that (language and/or functionality and/or other ideas) please share.

Also, if you need more info about using the TKLBAM profile, please hit me up via text message or something. Otherwise, I'll be back onto it on Monday.

Jeremy Davis's picture

Ok I'm back into it and hoping to get the basics of the tklbam profile finished within the next day or 2. I have a few other apps that need updating so hopefully I can get GitLab into that batch. If I can't get there, I may have to hold off further.

Also one thing that I probably should note is how to use the tklbam-profile. On your gitlab server, this is how it's done:

cd /root
wget https://github.com/JedMeister/gitlab/raw/omnibus-pkg-install/turnkey-gitlab-15.3-stretch-amd64.tar.gz
mkdir tklbam-profile
tar xzvf turnkey-gitlab-15.3-stretch-amd64.tar.gz -C tklbam-profile

Then finally initialise TKLBAM with the profile (and your HUB_API_KEY):

tklbam-init $HUB_API_KEY --force-profile=tklbam-profile

Note too, that this should also work if you have GitLab installed via Omnibus (e.g. Core with Omnibus GitLab installed). Although you will also need the hook script (and conf file). This should do the trick:

path=etc/tklbam/hooks.d
files="gitlab gitlab.conf"
url=https://raw.githubusercontent.com/JedMeister/gitlab/omnibus-pkg-install/overlay
for file in $files; do
    wget $url/$path/$file -O /$path/$file
done

PS I just realised that there is a dumb mistake in the hook script. I've fixed it and pushed back to the repo.

Jeremy Davis's picture

I've really been pulling my hair out... TBH, I'm a bit stuck and I'm not really sure where to go from here...

It seems that restoring a GitLab backup (created by GitLab installed via Omnibus; restored to exactly the same version of GitLab, along with the config and secrets files as noted in the docs) causes GitLab to stop working...! :(

The exact issue is that all seems well until you try to log in. As the 'root' user, login appears to proceed, but then GitLab gives a 500 error.

I can reliably and consistently reproduce this error using backups of various different v10.x and v11.x versions. Googling returns tons of results, dating all the way back to v8.x (possibly beyond) which suggests that this issue is not new. There is a chance that I'm doing something that GitLab doesn't expect, but IMO, it should "just work". Anyway, here's the stacktrace when the error occurs:

==> /var/log/gitlab/gitlab-rails/production.log "✓", "authenticity_token"=>"[FILTERED]", "user"=>{"login"=>"root", "password"=>"[FILTERED]", "remember_me"=>"0"}}
Completed 500 Internal Server Error in 248ms (ActiveRecord: 23.1ms)
  
OpenSSL::Cipher::CipherError ():
  
lib/gitlab/crypto_helper.rb:27:in `aes256_gcm_decrypt'
app/models/concerns/token_authenticatable_strategies/encrypted.rb:55:in `get_token'
app/models/concerns/token_authenticatable_strategies/base.rb:27:in `ensure_token'
app/models/concerns/token_authenticatable_strategies/encrypted.rb:42:in `ensure_token'
app/models/concerns/token_authenticatable.rb:38:in `block in add_authentication_token_field'
app/services/application_settings/update_service.rb:18:in `execute'
lib/gitlab/metrics/instrumentation.rb:161:in `block in execute'
lib/gitlab/metrics/method_call.rb:36:in `measure'
lib/gitlab/metrics/instrumentation.rb:161:in `execute'
app/controllers/application_controller.rb:467:in `disable_usage_stats'
app/controllers/application_controller.rb:453:in `set_usage_stats_consent_flag'
lib/gitlab/middleware/rails_queue_duration.rb:24:in `call'
lib/gitlab/metrics/rack_middleware.rb:17:in `block in call'
lib/gitlab/metrics/transaction.rb:55:in `run'
lib/gitlab/metrics/rack_middleware.rb:17:in `call'
lib/gitlab/middleware/multipart.rb:103:in `call'
lib/gitlab/request_profiler/middleware.rb:16:in `call'
lib/gitlab/middleware/go.rb:20:in `call'
lib/gitlab/etag_caching/middleware.rb:13:in `call'
lib/gitlab/middleware/correlation_id.rb:16:in `block in call'
lib/gitlab/correlation_id.rb:15:in `use_id'
lib/gitlab/middleware/correlation_id.rb:15:in `call'
lib/gitlab/middleware/read_only/controller.rb:40:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/request_context.rb:20:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:29:in `call'
lib/gitlab/middleware/release_env.rb:13:in `call'
Started GET "/favicon.ico" for 127.0.0.1 at 2019-03-06 06:14:53 +0000

I've found a workaround (and confirmed that it actually works), but I'm not really clear on the larger implications of applying it. It seemed to have no adverse effect on my minimal test day set (a single user with a repo and a couple of issues), but it'd be great to be able to get others to test it, ideally on a decent dataset. Especially one that is configured to run task.

FWIW, here's the workaround:

cat > /tmp/fix.rb <<EOF
settings = ApplicationSetting.last
settings.update_column(:runners_registration_token_encrypted, nil)
EOF
chown git /tmp/fix.rb
gitlab-rails runner -e production /tmp/fix.rb && gitlab-ctl restart
rm /tmp/fix.rb

As you can probably guess from the code, it wipes out the encryption for the runner tokens, although what the full implications of that might be are unclear to me...

OnePressTech's picture

After restoring the data are you resetting the GitLab instance?

If not, try executing the following after a restore (in this order):

gitlab-ctl reconfigure
gitlab-ctl restart

 

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

FWIW the gitlab_restore function that I have constructed is within the TKLBAM hook script. To save you from decoding my bash, here is the essence of what it does:

gitlab-ctl reconfigure
gitlab-ctl stop unicorn
gitlab-ctl stop sidekiq
gitlab-rake gitlab:backup:restore BACKUP="relevant_backup"
gitlab-ctl restart
gitlab-rake gitlab:check SANITIZE=true

So it looks like I'm not actually re-running 'gitlab-ctl reconfigure' AFTER restoring the backup! I suspect that's the issue! Also TBH, I don't recall why I'm running 'gitlab-ctl reconfigure' BEFORE I run the restore?! Seems a bit redundant in retrospect...

Armed with your input and my reflection, I'll try tweaking the script a little and see how we go.

To be explicit, I'll try moving the 'gitlab-ctl reconfigure' from before the restore, to afterwards (between the restore step and the restart step as you suggest).

Also do you have any thoughts on the value of running the check? Perhaps it's not really of value there and just slows things down?

OnePressTech's picture

See 500 Error on login after restoring backup

The reason you may have added the gitlab-ctl reconfigure before the restore is that the GitLab omnibus restore pre-requisites include the following requirement: "You have run sudo gitlab-ctl reconfigure at least once." The reality is that you can't install the GitLab without doing gitlab-ctl reconfigure so I am not sure of the purpose of that requirement.

So the revised restore would be...

	gitlab-ctl stop unicorn 
	gitlab-ctl stop sidekiq 
	gitlab-ctl status 
	gitlab-rake gitlab:backup:restore BACKUP="relevant_backup" 
	gitlab-ctl restart 
	gitlab-ctl reconfigure 
	gitlab-rake gitlab:check SANITIZE=true fff 

NOTE: From the GitLab restore documentation

To restore a backup, you will also need to restore /etc/gitlab/gitlab-secrets.json (for Omnibus packages) or /home/git/gitlab/.secret (for installations from source). This file contains the database encryption key, CI/CD variables, and variables used for two-factor authentication. If you fail to restore this encryption key file along with the application data backup, users with two-factor authentication enabled and GitLab Runners will lose access to your GitLab server.

Nice progress Jed...getting there :-)

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

Awesome stuff. That sounds good. I'm really hoping moving that gitlab-ctl reconfigure does the trick (I suspect that it will). On reflection, I'm not really sure why that didn't occur to me previously (nor why any of the many threads I've read haven't double checked with users experiencing the issue). Anyway...

FWIW the whole /etc/gitlab dir (i.e. gitlab-secrets.json & gitlab.rb, plus any TLS certs that have been generated) are included in the backup by TKLBAM itself. All the rest of the GitLab directories (i.e. /opt/gitlab & /var/opt/gitlab) are explicitly excluded. The GitLab backup runs prior to TKLBAM doing anything. The file is then transferred out of the GitLab backup directory (/var/opt/gitlab/backups by default) to a location that is included in the backup (currently /var/cache/tklbam-gitlab). Then TKLBAM does it's thing...

I've also renamed the backup file (and stored the original name in a text file to support the restore process). That way, assuming that GitLab tars up the files in the same order, unless lots changes, the daily incremental backups should still be quite small. The rationale is that if the file is named as per GL naming convention, as far as TKLBAM is concerned, it will be a new file every day. Giving it a consistent name, makes TKLBAM realise that it's essentially the same file, but it's not exactly the same (so does a binary diff). TBH, I haven't actually double checked that my understanding is correct, which I should do. Because if it makes no difference, it's probably better to just move it, rather than renaming it too.

It has also occurred to me to untar the GitLab backup before doing the TKLBAM backup. TKLBAM already tars everything up prior to upload, so there is no increase in backup size by doing that (it may actually decrease the size of the backup?!) If we went that way, you could be assured that the daily incremental backup size would only increase directly relative to the new files, etc. OTOH, for many users it may not make much difference and it means an additional operation at backup time (untarring the GL backup) and restore time (tarring it back up so GL will recognise it). All that additional process, means additional opportunities for stuff to go wrong though, so I'm inclined to leave it be... (Simply rename it to a generic name).

As per always, interested in any input you (or anybody else) has.

Also, unfortunately I've been dragged away onto other stuff now. So it seems unlikely I'll be able to get back to this until next week... I think I'm really close now though! :)

OnePressTech's picture

The reason I added the reminder is that the script you and I listed above only includes:

gitlab-rake gitlab:backup:restore BACKUP="relevant_backup"

That restores everything BUT the  gitlab-secrets.json file.

FYI - there is a 500 error issue for missing  gitlab-secrets.json:

https://gitlab.com/gitlab-org/gitlab-ce/issues/49783

Thanks again for all your hard work...Very much appreciated :-)

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

With a few explcit exclusions, the whole /etc directory (including /etc/gitlab) is included in the TKLBAM backup as part of the normal profile. So the gitlab-secrets.json file (and the config file, plus TLS certs) are all automagically restored by TKLBAM before any of the rest of this stuff happens. I appreciate the reminder though as I hadn't committed and pushed back the other required changes to the TKLBAM profile repo. So to avoid the risk of forgetting that, I've just done that now. :)

And actually, I wonder if re-running gitlab-ctl reconfigure with a different gitlab-secrets.json file (and then not re-running it after the restore) is perhaps part of the issue in the first place? TBH, I hadn't considered that before, but it actually seems plausible...

Anyway mate. Thanks again for your input. Hopefully I'll be able to tidy this up early next week. Cheers.

OnePressTech's picture

gitlab-rake gitlab:backup:restore BACKUP="relevant_backup"

Does not restore gitlab-secrets.json.

I know you are backing it up...what instruction is restoring it?

And yes...the gitlab-ctl reconfigure should be run AFTER the gitlab-secrets.json file is restored.

I expect that we're on the same page :-)

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

I understand that neither gitlab-secrets.json nor the gitlab.rb config file, are included when gitlab-rake gitlab:backup:restore BACKUP="relevant_backup" is run.

But because TKLBAM already includes most of /etc (with some exclusions) and gitlab-secrets.json (and gitlab.rb) are stored in /etc/gitlab, they are automagically included in the normal TKLBAM backup. I.e. no additional command/inclusion/etc is required to include them.

That too may have been part of the issue with the 500 errors. The restore process that I've scripted runs post TKLBAM restore. So the backed up gitlab-secrets.json file has already been restored when the restore component of the Gitlab specific hook script runs. But then I was running gitlab-ctl reconfigure (i.e. original data, with restored gitlab-secrets.json). Then restoring the data and not re-running gitlab-ctl reconfigure.

Hopefully I should be able to get back to this today. Armed with this additional info, I'm really confident that it won't take much more work to get it going. Then it'll just require the docs to be written. That may take a little more time, but hopefully shouldn't be too bad.

OnePressTech's picture

Cheers Jed :-)

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

Ok so it all looks pretty good at this point. Thanks to your hints on the order that I was doing things during restore. I'm almost certain that was the cause of the 500 errors (hadn't run reconfigure post restore with matching secrets file in place). Admittedly it was a limited dataset I tested the backups with, but that part relies on the GitLab backup/restore mechanism, so I'm pretty confident that is good. And my backup from one test server restored nicely on my second (clean install) test server and everything appeared to be as it should.

So all I need to do is tidy up the code a little, rewrite the history (my repo is 51 commits ahead of master - which is probably a bit excessive...) and do some last minute testing. I've made a start on the doc page and hopefully it shouldn't take too long to get it finished.

If all things go as planned tomorrow, I'll be doing the build. Probably publish early next week. :)

There are still a few "nice to haves" that would be good to include, but I think at this point, I'll just add them to tracker and leave them for another day...

OnePressTech's picture

I'll do some testing on the weekend.

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

FWIW, I've run out of time now today, but I have added the TLS (Let's Encrypt) cert stuff to Confconsole. I was going to leave that for now, but figured I may as well fix that little bit now...

I was hoping to get the build done, but didn't quite get there... :( Oh well. Monday it will be. If you get a chance to test that'd be great, but if you don't (or if you do and don't find any glaring issues) I'll aim to tidy up the commits and build it Monday, with a plan to publish ASAP. If there are bugs I've missed in the released version, I'll just fix it and re-release ASAP.

To reiterate the process of creating a TKLBAM backup using the profile in the repo and the hook script (with a minor update - the name of the tklbam profile file):

cd /root
wget https://github.com/JedMeister/gitlab/raw/omnibus-pkg-install/turnkey-gitlab-15.2-stretch-amd64.tar.gz
mkdir tklbam-profile
tar xzvf turnkey-gitlab-15.2-stretch-amd64.tar.gz -C tklbam-profile

Then finally initialise TKLBAM with the profile (and your HUB_API_KEY):

tklbam-init $HUB_API_KEY --force-profile=tklbam-profile

Note too, that this should also work if you have GitLab installed via Omnibus (e.g. Core with Omnibus GitLab installed). Although you will also need the hook script (and conf file). This should do the trick:

path=etc/tklbam/hooks.d
files="gitlab gitlab.conf"
url=https://raw.githubusercontent.com/JedMeister/gitlab/omnibus-pkg-install/overlay
for file in $files; do
    wget $url/$path/$file -O /$path/$file
done
OnePressTech's picture

I'll probably do a first pass on the weekend and then do another pass after your Monday / Tuesday update.

Cheers,

Tim (Managing Director - OnePressTech)

Add new comment