Adrian Moya's picture

[Update 30/09/2010]

I've made some minor changes to this patch based on feedback and research. The following is the list of changes:

- Reconfigured clamav with the right driver. Reading dansguardian documentation, I discovered that I was using the deprecated driver for clamav contenscanner. I don't know why package dansguardian depend on clamav and not clamav-daemon but I updated this configuration as recommended. Virus scan was tested using the recommended method. 

- Configured squid as transparent proxy. This one is still pending to see if it works. At my job in a VM I can't test it, at home I've been unable. Maybe this weekend. Or anyone in the community that can test if its working? Theory says that you run this one on a server and clients should get proxied transparently. I didn't like the idea of ARP Spoofing as it's a kind of hacker attack, we don't want to end up bringing a hacker appliance for the bad guys. You can make all kind of nasty things with this working on a network. So lets try transparent proxy to see if that suits our needs. 

- Removed wdap support. Yes, I read it never became a standard, and it works in some browsers and not in others, and configuration needs more hacking, so I just removed it in the hope that transparent proxy will do the trick. 

- Added extra iptables rules to prevent circumventing 

[/update]

This TKLPatch aims to build a web filter proxy that provides administrative control over the content that our users can access on the web. It was built following some suggestions from a previous thread in this forum. It combines an squid proxy with dansguardian, and includes sarg for reporting and clamav for virus scanning. There are many decisions to be made for an appliance like this, so comments and suggestions from other experienced members of this commnunity in this kind of appliance is welcome. 

Important: [added 30/09/2010]

This appliance is affected by the bug in turnkey core that prevent's the system to get a Nameserver. That said, please check that you got a nameserver from your dchp server. If not, refresh network settings, and restart squid. If you start without internet, squid won't work as it should. This is important to take in consideration for your tests. Of course, this wont be an issue when the final turnkey core is out.

Features

 
- Squid 3 proxy configured to work in transparent mode.
- ClamAV for virus scanning with default configuration.
- Dansguardian configured to log with squid format and contentscan with clamav.
- Sarg reading from dansguardian log to generate reports.
- Webmin modules for Squid, Dansguardian and Sarg for easy management. 
- Configure your browsers to use http://IP:8080 as a proxy. 
 

What it does

1 Set Hostname
HOSTNAME=proxy
echo "$HOSTNAME" > /etc/hostname
sed -i "s|127.0.1.1 \(.*\)|127.0.1.1 $HOSTNAME|" /etc/hosts
hostname proxy
2. Update package information
apt-get update
 
3. Install required packages [Updated: installing clamav-daemon instead of clamav]
install squid3 squid3-cgi squid3-client sarg webmin-squid webmin-sarg webmin-firewall clamav-daemon dansguardian
4. Install Dansguardian webmin module (overlayed). Currently on beta state but it worked perfectly in my tests.
/usr/share/webmin/install-module.pl /usr/share/webmin/module-archives/dgwebmin-0.7.0beta1b.wbm.gz
5. Enable clamav in dansguardian. This step is done after installing dansguardian because at this step clamav doesn't have a valid virus db in place, which prevents dansguardian to start if this option was enabled previously.

	sed -i "s|#contentscanner = '/etc/dansguardian/contentscanners/clamdscan.conf'|contentscanner = '/etc/dansguardian/contentscanners/clamdscan.conf'|" /etc/dansguardian/dansguardian.conf
sed -i "s|#clamdudsfile = '/var/run/clamav/clamd.sock'|clamdudsfile = '/var/run/clamav/clamd.ctl'|" /etc/dansguardian/contentscanners/clamdscan.conf
6. Update clamav to have a valid virus db on startup (helping dansguardian start nicely on first boot)
freshclam
 
7. Configure Squid in transparent mode (only listens in localhost for security)
sed -i "s/http_port 3128/http_port 127.0.0.1:3128 transparent/" /etc/squid3/squid.conf
8. Add clamav to dansguardian group (otherwise permissions prevent the av to scan files)
	usermod -a -G dansguardian clamav
9. Stop all services
service clamav-freshclam stop
service dansguardian stop
service squid3 stop
service apache2 stop
9. Clean apt cache
apt-get clean

Enjoy!

Forum: 
Tags: 
Adrian Moya's picture

I blocked port 3128 to prevent users from connecting directly to the proxy (squid) avoiding the web filter (dansguardian). 

Also I must comment that this is the kind of appliance that needs some know-how to use. It's not very straight forward to understand how to setup. We should include good basic docs with this one. 

Harrison's picture

Hello

With your appliance, how do you stop someone unblocking port 3128 if they are on a Windows machine?


Jeremy Davis's picture

Good work mate! This will be of interest to my work I suspect.

Dan Robertson's picture

This is actually something that I have wanted at home since my kids like getting on the internet.  I am sure many other people will find it useful.  Good job.

Adrian Moya's picture

Please try it and come back with some feedback. I you can, test that WPAD works as expected. 

Liraz Siri's picture

Good stuff Adrian! Complex though. I'd love to be proven wrong but I doubt we'll be getting much feedback on this before we release it as an appliance and put it in the hands of users. Hopefully with a tutorial on how to set it up in a simple usage scenario (e.g., SOHO LAN).

Why? Because this would be the first TurnKey appliance designed for an infrastructure rather than application role. Implementing this sort of solution effectively into the network infrastructure requires skills. Heck, just understanding how it works requires skills. I expect users are going to have to be IT savvy to set it up.

Frankly I'm skeptical of using WPAD via DHCP. I'm not sure how well that is supported by clients and I don't like that it's "voluntary" on their part. I think a better way would be to set things up so that web traffic transparently routes the proxy.

For larger networks the firewall would be separate from the proxy. For SOHO networks the firewall and proxy could be one and the same.

But even that is too difficult. The ideal solution really would be magical. You just turn it on and your Internet traffic is filtered.

But the only way I can think of to make web filtering truly "TurnKey" is to use an ARP spoofing tool to force traffic going out to the local gateway through the proxy instead. For a real world example see "Fun with squid, imagemagick, and ARP spoofing".

BTW Adrian how did you test this?

Adrian Moya's picture

The WPAD way is the simplests (from what I read) to tell the browser to "Autodetect Proxy Settings" instead of manually setting them up. I'll check the suggestions you have made here to see if there's a simpler way. 

Another option could be integrating the DHCP server in this appliance. That way, out of the box, you get the WPAD working. I didn't do this because it's common to have a DHCP server in your router. I think it'll be more complex for the users to deactivate the router's dhcp server than to add the option. Maybe is the same grade of complexity. 

My tests where simple: I openned a secondary browser (firefox this case) and set the proxy setting manually to my appliance. Then I tryed browsing a common adult magazine page (The first time it didn't work so I ended up with the page opened in my job!) but then I finished working with the appliance and it came the expected Access Denied page. Now that I think, maybe a hacker's page should get blocked too, but I just went we the most common usage scenario (preventing adult sites). I was able to browse other pages fine. But by default, dansguardian rules comes a bit tight, and I ended up with some false positives. So users of this appliance would have to relax them a bit. That's fine, as it's better to relax rules than to tight them. 

I couldn't test WPAD as I was in my job. I was planning to test it at home but didn't had the time. At home is where I could try other options, as I have full control over the router. But community feedback is important on this case. The proxy can be password protected and users lists and access lists can be made for different groups of users. As this scenario is more corporate than home, I did not include this by default. 

I thought on the transparent proxy option, but as I said, too many decisions to make without some feedback. I always think in the non-experienced user, but I don't forget about more advanced users niether. So I tried to make a balanced set of options. 

Thanks for your comments, there are several appliances that can be in an Infraestructure category. I'll be happy to contribute with usage documentation for this and the others TKLPatchs that make their way to TKL 10.04. Like the LDAP appliance that it's going to need some quick guide. 

I also had the idea of making recipes. If we open a "Recipes" space in the wiki, you can make a recipe for, lets say, webfilterproxy+openldap+webmail, how to integrate all appliances. 

P.S: The access denied page can be customized with a nice TKL-theme page. You can use your design abilities on that ;) I found other options in dansguardian website, with forms to inform the admin, but wanted to keep it simple for the appliance. But a custom TKL Access Denied page would be a nice touch.

Jeremy Davis's picture

I think for me, both at home and at work (small NGO) the most straightforward way to go will probably be to integrate with a DHCP. But I can understand your hesitation in going that path. I will test it as is first and see how it goes. If it works ok in both environments maybe I'll leave it as is (although personally I prefer to run a DHCP server under Linux anyway because IMO it is more reliable and flexible).

Reading all the ideas, options and possibilities for config, really makes me think that for maximum flexabilty and user friendliness, appliances like this could really do with their very own Webmin module (or similar) hooked into some helper scripts. So for example, if a user wanted to use an integrated DHCP they could just click the "integrated DHCP" radio button on the TKL Proxy page (which would move the dot from the default WPAD option) and beind the scenes a helper script installs DHCP and the appropriate Webmin module and set everything up. Idealy it could also check for a conflict on the LAN (ie make sure no other DHCP is already running). But I guess thats just getting a bit complicated for now. Perhaps I should just shutup (or better still go learn some Perl...) ;)

Timothy's picture

I get an error everytime I start up apache2, saying:

Failed to start apache : 
 :
 * Starting web server apache2
   ...done.

Please Help me fix this, I have no idea, how to fix this, and why this is happening.

Adrian Moya's picture

tail -100 /var/log/boot.log

tail -100 /var/log/apache2/error.log

Is this error when booting up? are you using the newest patch version? I updated it this week.

Timothy's picture

how to force all computers to use dansgurdian? without going to each computer, and setting the proxy settings for dansgurdian?

John Botha's picture

I'm not sure whether this is the right forum, but in reading the above, I have a number of comments. First, a bit of context.

I've been using some form of filtering at home for years now, and as my children have gotten older, so the requirements have gone up. For years I used IP-Cop successfully, until it became too much work to adapt it to my requirements. Then followed a few others, ending with eBox. When they changed name (to Zentyal) and direction, I left it. At work we also threw it out, because it was not stable, and developer responses were not up to the standards required for a small business server.

Since then I've been searching for something that works well and is easy to configure and maintain. I've been using Linux since kernel 0.99.something in early 1993, have written my own boot floppies, etc., but I have a job and family, so don't have time to fool around with settings for the fun of it.

That said, my perspective on some of what you'd need. Bear in mind that below I am assuming sane defaults, good docs, and easy configuration. Plus I'm really glad that someone is doing this: thank you!

1. Simply proxying web and mail traffic is insufficient. The appliance must also be a firewall with a default deny policy. That way you either go through the firewall and the proxies, or you surf via your phone.

2. This means that it must have two network interfaces. It is possible to do it with one, but that requires more complex configuration on both the router and the appliance. Don't eliminate the possibility; just put it under the advanced config section and document it well.

3. The appliance needs to serve DHCP.

4. For even a fairly normal home set-up of parents and children, you need users and user groups, with group-wide rights & restrictions. In my case, my wife is a therapist, and needs access to research on children. Guess what content filters have to say about that...? My son studied art, and needed access to images which were definitely unsuitable for my daughter at that time.

5. You mustn't proxy SSL (it would constitute a man-in-the-middle attack), so that would have to be let through; preferably after some form of authentication, depending on the firewall used.

6. User & group based transfer rate and transfer amount throttling: my son just loooves YouTube, and being in South Africa, our bandwidth is not yet as commoditised as the ISPs would have us believe. It would be even better if certain sites were/not throttled, depending on config.

7. Layer 7 is a very good place to do the above.

There is more, and I can list the packages which others use. Should I do so here, or privately?


Adrian Moya's picture

Hi John, thanks for your feedback, an appliance like this one is complex to do for a general usage scenario, there are many options, feel free to give your opinions and feedback here, this appliance surely needs a bit more work, I agree with you that the appliance should have two nics, and the DHCP server was also an option to consider. 

I'll continue improving this one based on users feedback. 

Timothy's picture

The Transparent Proxy doesn't work, whenever I take out the proxy settings in internet explorer, on windows, I am not filtered anymore, and it will not work, like when I have the proxy settings set on my computer on windows in internet explorer.

Please Tell me how to fix this.

Please Someone help me with this problem.

Jeremy Davis's picture

It can only provide a proxy if the clients use it! If you still have an open connection to the internet and priveliges to be able to change the connection settings on your PC then you can easily subvert it. Generally in your internet setting you would set your gateway to the proxy rather than directly to a router.

Jeremy Davis's picture

Sorry if my post was somewhat misleading.

So yes you are right, but only if the gateway is configured to refuse connections from anywhere but the proxy. If the gateway will accept any connection (like many/most consumer grade router/modems) and the user has access to the TCP/IP config on a device (so static IP and hardcoded gateway IP can be set by user - like the average Admin/sudo account for popular desktop OS, also most smartphones and other internet/network enabled devices have these features) then the proxy is easily subverted.

Having a DHCP server that directs clients automatically to the proxy (as the gateway) is part of the solution (and thanks for mentioning that - I must admit my response did sound a bit like "bad luck - that's how it is"). But while the gateway still accepts connections other than the proxy (and users have control of their devices) it can still be subverted. One way to avoid this (while still using cheap consumer grade hardware) is to have the proxy be the one and only machine connected to the modem. Then connect all other machines to the proxy via router. This way no other machines have access to the gateway except through the proxy.

Sorry if my response seemed to suggest that it can't be configured in the way you suggest - obviously it can. But it relies on features that may not be available in the sort of consumer grade hardware most use in their homes or small to medium office environments.

Timothy's picture

 How to allow ports for dns, samba4, etc in iptables?

Because iptables doesn't allow input.

how would I do this?

Jeremy Davis's picture

But it is much easier (and more reliable) to use a base TKL appliance (I recommend Core) with TKLPatch installed and then patch an ISO like this:

tklpatch turnkey-core-11.1-lucid-x86.iso webfilterproxy.tar.gz

If you'd rather just patch the installed ISO (like you've done) then the easiest way to acheive this is:

Follow your initial instructions then following '58' do this instead:

apt-get update
apt-get install tklpatch
tklpatch-apply / webfilterproxy.tar.gz

Documentation on applying a TKLPatch: http://www.turnkeylinux.org/docs/tklpatch/apply
Complete TKLPatch documentation: http://www.turnkeylinux.org/docs/tklpatch

PS Hopefully the TKL devs will release part 2 of v11.x sometime soon (it's already well overdue!) which will include this as an official TKL appliance.

Jeremy Davis's picture

Hi Adrian. I was going to have a play with this patch and see what i could do with this appliance, but unfortunately the download seems to be corrupt. I tried downloading it a few times but got the same issue everytime.

tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

It's ok, I downloaded it straight into a TKL appliance (using wget) and renamed it and it seems fine. It was also ~200B smaller too. Seems that the web browser was doing something bad to it... (Using Firefox 6.0.2 under Bodhi Linux 1.0.1 - based on Ubuntu 10.04)

Adrian Moya's picture

Please share your experiences with this one.

Jeremy Davis's picture

It works very well straight out of the box. I have only done some preliminary testing so far but have been very happy with it, and to be honest, amazed how it all just works. Nice touch including the DansGuuardian (and Squid) Webmin module(s) (although I haven't played with it/them much yet).

I noticed that the ClamAV Webmin module wasn't installed and had a crack at installing it but it needs heaps of perl dependancies, some of which I can not seem to fulfill from either Ubuntu repos or CPAN so not sure where to there. (Did you experience the same? Or just not attempt this?)

As for AV and url blocklist updates, is the appliance configured to grab these automatically somewhere/somehow? I had a bit of a look but couldn't see anything much (although in honesty I haven't had a really good look). [update: I just read that ClamAV can be set to auto update so I guess you probably did that?] In fact I couldn't see any free url blocklists that are suitable (there is a reasonably priced one that works on an honor pay system, but again I didn't look very hard, curious on your thoughts on this. Considering how well it works off the bat (and the patch is somewhat dated now) perhaps the phrase filtering system is adequite? Surely users would at least need some sort of AV updates though?

Anyway I haven't yet set it up properly, just manually configured my browser to use it as a proxy. The next step is to set it up properly so all browsers on site automatically connect through it and it can't be subverted (at least not easily). I have done a little reading in this regard and will have a crack at it sometime soon. I will document my experience and post back.

Also out of interest, do you think it would be hard to integrate a web caching proxy function into this appliance too, or would that be a bit tricky?

Jeremy Davis's picture

I have set up a WebFilterProxy (WFP) appliance (as a PVE guest) and set my modem/router to deny all ethernet connections (except WFP). I suspect to get this to work how I want I'm going to need to use a separate router (with wifi) and just have the original modem/router running as a modem only (just connected to my PVE host) as it looks like blocking direct connections stops it connecting to anything at all except through the proxy. I haven't adjusted DHCP yet so I am setting the gateway (pointing to WFP) manually. Anyway, using my Netbook (running Bodhi Linux 1.2.0 [Ubuntu 10.04 based with lots of updated apps] - but allowed to connect to the wifi router) and have set WFP as the 'gateway'.

A few things I have found so far:

  • I still need to manually configure my web browser to use the proxy (Firefox 6.0.1). Otherwise web browsing results in time outs.
  • For HTTPS websites to work, the browser must be set to use the proxy for SSL connections. Unfortunately this seems to mean that Webmin/Webshell/phpMyAdmin/etc (ie anything that uses https on non-standard ports) doesn't work (it always seems to try to use 443). To work around this for accessing these on my PVE guests I made a proxy exception for LAN traffic. But with WFP (and my netbook) as the only thing(s) allowed to connect it still doesn't work. (I need to allow the VMs to connect to the router to get it to work - ie sidestep the proxy).
  • DNS must be supplied from within the LAN (when I tried to use Google DNS - 8.8.8.8 - it times out).
  • apt/Synaptic must also be manually set to use the proxy (otherwise it can't get a connection).
  • No connections to anything other than http and https outside of my LAN - no matter what I do I can't seem to get it to connect to SSH or SFTP (or any other https using any port other than 443).
  • On further use it seems that DansGuardian is doing some very aggressive 'phrase' filtering. It won't let me connect to many FaceBook pages - because apparently they have "Pornography (Japanese)" there! Also I found fairly simple google searches are also blocked (claiming various pornography - including Norwegian!). Even some of the DG online config pages are blocked! I have tried to exclude sites/urls from filtering (both through Webmin and manually) but it doesn't seem to make any difference (and yes I remembered to restart DG).

So I'm not really sure where to go from here on in. I've searched fairly extensively online and nothing is really helping me out here. Be pleased to get any ideas.

Adrian Moya's picture

I think this kind of appliance is ment to be run with two nics, one connected to the LAN and the other to internet. It's been so long that I made this appliance I don't remember the details, but here are some thoughts:

- ClamAV: I remember installing clamAV webmin module and finally discarting it as I felt it didn't play with the appliance (but don't remember well why, but I think it was because the client itself is installed as a daemon and not as the client) It should update automatically. And it kicks only after downloading files. 

- EVERYTHING that request access to the internet should pass this filter, that's why other services and ports should be configured, maybe in the appliance's firewall, so you can access those services. That's why I think you'll be best with two nics, and two routers. ideally, all LAN services should be freely accessible from the LAN, you are just blocking internet access. Everything else are rules in the firewall to redirect traffic to the outside or the wfp.

- I couldn't get auto configuration for browsers working completely. 

- Squid IIRC is caching web request already, so the appliance include that feature. 

- You have two approaches to filtering: start light and enforce them, or start strong and lighten (<- not sure if this word is ok!) them. The defaults in the appliance was to start strong. So, using the webmin module, you can start teaching the appliance that not everything is pornography :) 

Sorry for not being more specific/helpfull, this one is a complex appliance, and every scenario could be different. And I forgot all my research on the topic :P

Jeremy Davis's picture

I had a play with this some time ago and it was ok. But I suspect considering it's age (was developed on/for the TKL v11 beta) that it probably needs serious reqorking before it could be a contender for release on v13...

Even in a best case scenario (i.e. if this customised appliance were 'ready to roll') it wouldn't be released until TKL v13. And that won't release until official Debian 7 is made 'stable' and who knows when that will be... (although it's looking like it may not be too far away... 7.0RC1 released Feb 17).

Humphrey Davy's picture

I keep gettting these. Is that normal?

 


	Hit http://cdn.debian.net squeeze/contrib amd64 Packages
Reading package lists... Done
+ install squid3 squid3-cgi squid3-client sarg webmin-squid webmin-sarg webmin-firewall clamav-daemon dansguardian
+ DEBIAN_FRONTEND=noninteractive
+ apt-get -y -o DPkg::Options::=--force-confdef -o DPkg::Options::=--force-confold install squid3 squid3-cgi squid3-client sar
g webmin-squid webmin-sarg webmin-firewall clamav-daemon dansguardian
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'squid-cgi' instead of 'squid3-cgi'
Note, selecting 'squidclient' instead of 'squid3-client'
Package sarg is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
 
E: Package 'sarg' has no installation candidate 
Humphrey Davy's picture

Hope this helps a newbee like me.

 

I used this tutorial to install sarg link

How to install SARG on Debian 6.0

 
SARG is not included in the default apt sources of Debian Squeeze. To solve this problem, you need to add the "squeeze-backports" apt source in your "/etc/apt/sources.list".
 
1. Open your apt source file.
$nano /etc/apt/sources.list
 
2. Add the "squeeze-backports" at the end of the file. Add the following line and save.
 
 
3. Update your system.
$apt-get update
 
jm2k7@hotmail.com's picture

I followed the guide, but when I start using the proxy throws me this error.

WARNING: Could not perform virus scan! 

this error occurs when I try to open any page.

I solved by modifying this entry:

 

/etc/dansguardian/dansguardianf1.conf
naughtynesslimit = 160
disablecontentscan = on
 
If anyone knows why it occurs, I would like to give me a hand.
 
thanks.-
 
 
 
 
Jeremy Davis's picture

And as it never actually got released we haven't maintained it. So right now you know more about this than us! :)

Add new comment