EMERGENCY - production server full

Nikos's picture

Hi all

I have a joomla application running in an amazon ec2 small container. This reports that my main filesystem has been allocated 10GB of size. This has been increasing gradulaly even though I have been gzip and deleting old log files. I am now at 96%!

Webmin is not operating correctly due to this and I am not able to log into the backend of my joomla application.

Checking the details from Amazon I noticed that the EC2 small instace is suppost to have 160GB of space but I only have 10GB allocated to my main filesystem partition. Is there any way to extend my primary partition in order to utilize the other 140GB of space?

any help would be greatly appreciated as I have a live site in that container

Jeremy Davis's picture

Please note I'm not that experienced with AWS and so it's possible that some of what I say here is not a good idea for some reason (although it all seems ok to me...). If someone knows better please post and let me (us) know! Thanks.

Anyway I just launched a server to see what is going on and it seems that there are 3 partitions by default; sda1, sda2 and sda3. Here's what the fdisk of mine looks like:

# fdisk -l

Disk /dev/sda1: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sda1 doesn't contain a valid partition table

Disk /dev/sda2: 160.1 GB, 160104972288 bytes
255 heads, 63 sectors/track, 19464 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sda2 doesn't contain a valid partition table

Disk /dev/sda3: 939 MB, 939524096 bytes
255 heads, 63 sectors/track, 114 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sda3 doesn't contain a valid partition table

So it looks like sda1 is the 10GB you're talking about and sda3 is the swap. And the 160 sda2?!? I wonder...

AFAIK you can ignore complaints that it "doesn't contain a valid filesystem". Normally that would indicate an unformatted drive but it seems to be different with EC2. Check to make sure your's is mounted with this:

# mount|grep sda2
/dev/sda2 on /mnt type ext3 (rw)

As you can see there is 160GB sitting there unused, mine is mounted on /mnt

So I've got an idea. But before you proceed, make sure you have a backup in case something goes wrong! I'm serious - you could really breeak something!!

To make use of this 160GB I'd suggest that you just mount the root folder where all your data is. If you're not 100% sure where that is then you can have a look using the du -sh command (-s is single depth ie just files and folders in the specified location & -h is humand readable ie KB/MB/GB instead of bytes). This is what a clean install of fileserver gives me:

# du -sh /* |grep -v 0
27M    /boot
124K    /dev
11M    /etc
143M    /lib
16K    /lost+found
44K    /mnt
du: cannot access `/proc/19378/task/19378/fd/4': No such file or directory
du: cannot access `/proc/19378/task/19378/fdinfo/4': No such file or directory
du: cannot access `/proc/19378/fd/4': No such file or directory
du: cannot access `/proc/19378/fdinfo/4': No such file or directory
44K    /root
5.6M    /sbin
12K    /srv
351M    /usr
166M    /var

(the grep -v 0 excludes 0 length files). Obviously in a clean install system components (that you probably won't want to try moving) are taking up most of the space. In yours it should be your data and hopefully it should be pretty obvious. You can narrow down the location further by changing the path of du ie to look inside /var use:

du -sh /var/* |grep -v 0

Once you have found the place where you want to mount your 160GB then move the contents to sda2, and remount it where they were.. I'll give an example where I will mount sda2 to /var/www (you may need to stop services such as Apache to get this to work cleanly?)

mv /var/www/* /mnt
umount /mnt
mount /dev/sda2 /var/www

Done! Hopefully that solves your issue... Let us know how you go.

Liraz Siri's picture

Your solution would also work but mount --bind is a bit more general. You can just redirect any part of the filesystem you want to instance storage on /mnt.

Jeremy Davis's picture

That's useful. I recall you suggesting that to redirect /tmp with those having issues with large TKLBAM restores. That sounds like a much better plan. I'll post a bit more below...

Liraz Siri's picture

See the new tutorial I linked to below. ncdu is pretty sweet.
Nikos's picture

Wow Jeremy! you really are number one on this supoprt forum for a good reason.

I am in a very difficult position right now - with 96% of my 10GB partition being used I can seem to do basic things like (take a backup) or perform selects on DB tables as the system keeps reporting that there is not space.

I tried running the tklbam process from the Webmin but that too failed reporting no space. The last backup image I have was over a month ago :S

I have tried executing all sorts of find commands with different combinations so that I can find the culprit to why all my space is being chewed up but even then I get no joy.

I tried running the following to find a list of all the big files but it returned an error

find / -type f -size +20M -exec ls -lh {} \; 2> /dev/null | awk '{ print $NF ": " $5 }' | sort -nk 2,2

when I try to excute the following

du -h / | grep ^[2-9][0-9][0-9][0-9.]*M

to check which folders exist on the system that are larger than 200MB I get a very small list amoung them is my website residing in /var/www/ which reports 530MB in size!

I have no idea what is chewing up all my other space - checked log files too - nothing suspect there.

Anyway, I am thining to just wing it and give your recommendation a go - I've copied all the website files along with an SQL dump so I guess if worse comes to worse I could always setup another server and copy the files over - not ideal I know but at least I've managed that.

Liraz Siri's picture

There's 160GB of instance storage on available on /mnt, but you can just extend the partition into it in place, so you can do something that is very close.

I recommend you move over the directories that are filling up like this:

mkdir -p /mnt/tmp
mount --bind /mnt/tmp /tmp

cp -a /path/to/big /mnt/big
mount --bind /mnt/big /path/to/big

The mount --bind will expire when you reboot the server so I recommend you configure your /etc/fstab to mount --bind on boot. You can also do this via Webmin (Hardware -> Filesystems).

Jeremy Davis's picture

I didn't think of the --bind switch (in fact it was only from your examples to those having issues with the big TKLBAM restores that I came across it). It seems much more suited to the job at hand, especially as there will be multiple big folders no doubt.

Just one thing: If you wish to recover the used space on sda1 don't you actually have to move (mv) the files - not just copy (cp) them? Although I know they are hidden by the mount, AFAIK the data still physically resides on the disk. With 160GB to spare it's probably not a huge issue I know, but I would still think of it as best practice. Is there a compelling reason to leave the data there?

And good catch on the mount on boot thing (ie fstab). That completely slipped my mind. Obviously any manual mounts are lost on reboot so need to be set in fstab.

Jeremy Davis's picture

And ncdu looks like a much better command to use. I didn't know about that one!

Only thing is, can you please check your explanation of EBS vs S3 backing types. I think you have them the wrong way round? Doesn't EBS survive stopping? Either that or I'm completely confused!

Liraz Siri's picture

The explanation in the tutorial is correct though there may be room for improving the wording. Not sure about that.

What you have to keep in mind is that whenever you start an instance (either S3-backed or EBS-backed) it launches on a new physical host. If you stop an EBS-backed instance and start it a year later it's extremely unlikely to start up again on the same physical server.

But instance storage is implemented as a volume in the physical server's hard drive. That's why if you stop an EBS backed instance, the instance storage is lost. Because it resurrects on another server.

What may be confusing you is where the root filesystem lives. EBS-backed instances can be stopped and later started (really resurrected on another server) because the root filesystem is on an EBS volume in the network, not local instance storage on some hard drive.

OTOH, S3-backed instances have the root filesystem on instance storage. That's why you can't stop and start them too. Get it?

Jeremy Davis's picture

I'll sit and ponder this and post back properly later...

...ok I'm back. I think I have it now. Where my confusion was coming in, was that I was under the impression that EBS volumes were persistent, ie that while you pay your monthly fees the data remains indefinately - even if you have no servers running. Obviously it doesn't quite work like that.

Alon Swartz's picture

EBS volumes are persistent storage, and any data stored on them will remain intact even when they aren't attached to a server. Thats what EBS volumes are for - think of them as external hard drives you can attach and mount to a physical server.

Jeremy Davis's picture

So in theory they are completely persistant (my original thinking...) but in practice, not quite so... Is that right? That's how I understand what Liraz wrote:

For EBS-backed instances: instance storage survives reboots but doesn't survive if you stop an instance or destroy it. This is because stopping an EBS-backed instance detaches it from the physical host. When you restart your server (e.g., a year later) it will probably be resurrected on an entirely different physical host.

Oh, hang on... I just read a little more... I think I get it now. So there is a difference between the EBS backing of an instance vs EBS volumes?! Is that right?

PS Sorry to hijack your thread Nikos

Jeremiah's picture

I second Alon.  You're first impressions were right.  EBS always persists no matter what happens to the instance.

I've added a comment to the documentation that hopefully will help clear things up.  We can move the conversation there too if it helps.

Nikos's picture

Hi guys - thanks for the fantastic response - I am dying to try all of these solutions out. Will keep you all posted o my progress. The thing that I do find very strange is when I run:

du -sh /* |grep -v 0

my /var folder reports  3.8G

When I run the following command to drill into /var and see folder size the reported sizes I get are no where close to 3.8G

 du -sh /var/* |grep -v 0

135M    /var/cache
96M     /var/log
16K     /var/mail
64K     /var/run
576K    /var/spool
296K    /var/webmin
577M    /var/www

I am missing something?

Jeremiah's picture

I would remove the grep -v 0.  I wonder if that is removing some of the other directories unintentionally.  Just use 'du -sh /var/*'.  It won't be a long list and hopefully it will show the large directory you are looking for.

Jeremy Davis's picture

That is true. I used it with success in the lower levels, but you are right. It is possible that it may be removing other entries.

Better still if you use the ncdu command Liraz mentions. I haven't tried it yet but looks like a killer commandline tool.

Jeremiah's picture

Edit: Sorry, Ignore my post here.  I was in a rush to offer help and didn't see this was already mentioned.

I use 'du -sh /*' to find what directory is taking so much space.  The 's' stops at the first level of directories rather than diving into every directory.  Then you can progress into each level as needed for example 'du -sh /var/*'.  See how much space tklbam-backup metadata is taking.  'du -sh /var/cache/duplicity' 

Jeremy Davis's picture

I think the more the merrier! Personally I always appreciate other's input. Often I end up learning something! :)

Post new comment