Wednesday, September 24, 2014

Tips for creating better bash scripts

For today's article I wanted to cover a few tips that will make your scripts better, not better for you but better for the next person who has to figure out why it isn't working anymore.

Always start with a shebang

The first rule of shell scripting is that you always start your scripts with a shebang. While the name might sound funny the shebang line is very important, this line tells the system which binary to use as the interpreter for the script. Without the shebang line, the system doesn't know what language to use to process the script script.
A typical bash shebang line would look like the following:
#!/bin/bash
I have seen many scripts were this line is missing, one would think if it wasn't there than the script wouldn't work but that's not necessarily true. If a script does not have an interpreter specified then some systems will default to /bin/sh. While defaulting to /bin/sh would be ok if the script is written for bourne shell, if the script is written for KSH or uses something specific to bash and not bourne than the script may produce unexpected results.
Unlike some of the other tips in this article this one is not just a tip but rather a rule. You must always start your shell scripts with the interpreter line; without it your scripts will eventually fail.

Put a description of the script in the header

Whenever I create a script or any program for that matter I always try to put a description of what the script will do in the beginning of the script. I also include my name and if I am writing these scripts for work, I will include my work email address as well as a date that the script was written.
Here is an example header.
#!/bin/bash
#### Description: Adds users based on provided CSV file 
#### CSV file must use : as separator
#### uid:username:comment:group:addgroups:/home/dir:/usr/shell:passwdage:password
#### Written by: Benjamin Cane - ben@example.com on 03-2012
Why do I put all of this? Well it's simple. The description is there to explain to anyone who is reading through the script what this script does and any information they need to know about it. I include my name and email on the chance that if someone was reading this script and they had a question for me they could reach out and ask. I include the date so that when they are reading the script they at least have some context as to how long ago the script was written. The date also adds a bit of nostalgia when you find a script you've written long ago and ask yourself "What was I thinking when I wrote this?".
A descriptive heading in your scripts can be as custom as you want it to be, there is no hard fast rule of what needs to be in there and what doesn't. In general, just keep it informative and make sure you put it at the top of the script.

Indent your code

Making your code readable is very important, it's something that a lot of people seem to forget as well. Before we get too far into why indentation is important let's look at an example.
NEW_UID=$(echo $x | cut -d: -f1)
NEW_USER=$(echo $x | cut -d: -f2)
NEW_COMMENT=$(echo $x | cut -d: -f3)
NEW_GROUP=$(echo $x | cut -d: -f4)
NEW_ADDGROUP=$(echo $x | cut -d: -f5)
NEW_HOMEDIR=$(echo $x | cut -d: -f6)
NEW_SHELL=$(echo $x | cut -d: -f7)
NEW_CHAGE=$(echo $x | cut -d: -f8)
NEW_PASS=$(echo $x | cut -d: -f9)    
PASSCHK=$(grep -c ":$NEW_UID:" /etc/passwd)
if [ $PASSCHK -ge 1 ]
then
echo "UID: $NEW_UID seems to exist check /etc/passwd"
else
useradd -u $NEW_UID -c "$NEW_COMMENT" -md $NEW_HOMEDIR -s $NEW_SHELL -g $NEW_GROUP -G $NEW_ADDGROUP $NEW_USER
if [ ! -z $NEW_PASS ]
then
echo $NEW_PASS | passwd --stdin $NEW_USER
chage -M $NEW_CHAGE $NEW_USER
chage -d 0 $NEW_USER 
fi
fi
 
Does the above code work, yes but it's not very pretty and if this was a 500 line bash script without any indentation it would be pretty hard to understand what's going on. Now let's look at the same code with indentation.

NEW_UID=$(echo $x | cut -d: -f1)
NEW_USER=$(echo $x | cut -d: -f2)
NEW_COMMENT=$(echo $x | cut -d: -f3)
NEW_GROUP=$(echo $x | cut -d: -f4)
NEW_ADDGROUP=$(echo $x | cut -d: -f5)
NEW_HOMEDIR=$(echo $x | cut -d: -f6)
NEW_SHELL=$(echo $x | cut -d: -f7)
NEW_CHAGE=$(echo $x | cut -d: -f8)
NEW_PASS=$(echo $x | cut -d: -f9)    
PASSCHK=$(grep -c ":$NEW_UID:" /etc/passwd)
if [ $PASSCHK -ge 1 ]
then
  echo "UID: $NEW_UID seems to exist check /etc/passwd"
else
  useradd -u $NEW_UID -c "$NEW_COMMENT" -md $NEW_HOMEDIR -s $NEW_SHELL -g $NEW_GROUP -G $NEW_ADDGROUP $NEW_USER
  if [ ! -z $NEW_PASS ]
  then
      echo $NEW_PASS | passwd --stdin $NEW_USER
      chage -M $NEW_CHAGE $NEW_USER
      chage -d 0 $NEW_USER 
  fi
fi
 
With the indented version it is a lot more apparent that the second if statement is nested within the first, you may not catch that at first glance if you were looking at the un-indented code.
The style of indentation is up to you, whether you want to use 2 spaces, 4 spaces or just a generic tab it doesn't really matter. What matters is that the code is consistently indented the same way every time.

Add Spacing

Where indentation can help make code understandable, spacing helps make code readable. In general I like to space code out based on what the code is doing, again this is a preference and really the point is just make the code more readable and easy to understand.
Below is an example of spacing with the same code as above.

NEW_UID=$(echo $x | cut -d: -f1)
NEW_USER=$(echo $x | cut -d: -f2)
NEW_COMMENT=$(echo $x | cut -d: -f3)
NEW_GROUP=$(echo $x | cut -d: -f4)
NEW_ADDGROUP=$(echo $x | cut -d: -f5)
NEW_HOMEDIR=$(echo $x | cut -d: -f6)
NEW_SHELL=$(echo $x | cut -d: -f7)
NEW_CHAGE=$(echo $x | cut -d: -f8)
NEW_PASS=$(echo $x | cut -d: -f9)

PASSCHK=$(grep -c ":$NEW_UID:" /etc/passwd)
if [ $PASSCHK -ge 1 ]
then
  echo "UID: $NEW_UID seems to exist check /etc/passwd"
else
  useradd -u $NEW_UID -c "$NEW_COMMENT" -md $NEW_HOMEDIR -s $NEW_SHELL -g $NEW_GROUP -G $NEW_ADDGROUP $NEW_USER

  if [ ! -z $NEW_PASS ]
  then
      echo $NEW_PASS | passwd --stdin $NEW_USER
      chage -M $NEW_CHAGE $NEW_USER
      chage -d 0 $NEW_USER 
  fi
fi
 
As you can see the spacing is subtle but every little bit of cleanliness can help make the code easier to troubleshoot later.

Comment your code

Where the header is great for adding a description of the scripts function adding comments within the code is great for explaining whats going on within the code itself. Below I will show the same code snippet from above but this time I will add comments to the code that explains what it does.

## Parse $x (the csv data) and put the individual fields into variables
NEW_UID=$(echo $x | cut -d: -f1)
NEW_USER=$(echo $x | cut -d: -f2)
NEW_COMMENT=$(echo $x | cut -d: -f3)
NEW_GROUP=$(echo $x | cut -d: -f4)
NEW_ADDGROUP=$(echo $x | cut -d: -f5)
NEW_HOMEDIR=$(echo $x | cut -d: -f6)
NEW_SHELL=$(echo $x | cut -d: -f7)
NEW_CHAGE=$(echo $x | cut -d: -f8)
NEW_PASS=$(echo $x | cut -d: -f9)

## Check if the new userid already exists in /etc/passwd
PASSCHK=$(grep -c ":$NEW_UID:" /etc/passwd)
if [ $PASSCHK -ge 1 ]
then
  ## If it does, skip
  echo "UID: $NEW_UID seems to exist check /etc/passwd"
else
  ## If not add the user
  useradd -u $NEW_UID -c "$NEW_COMMENT" -md $NEW_HOMEDIR -s $NEW_SHELL -g $NEW_GROUP -G $NEW_ADDGROUP $NEW_USER

  ## Check if new_pass is empty or not
  if [ ! -z $NEW_PASS ]
  then
      ## If not empty set the password and pass expiry
      echo $NEW_PASS | passwd --stdin $NEW_USER
      chage -M $NEW_CHAGE $NEW_USER
      chage -d 0 $NEW_USER 
  fi
fi
 
If you were to happen upon this snippet of bash code and didn't know what it did you could at least look at the comments and get a pretty good understand of what the goal is. Adding comments to your code will be extremely helpful to the next guy, and it might even help you out. I've found myself looking through a script I wrote maybe a month earlier wondering what I was doing. If you add comments religiously it can save you and others a lot of time later.

Create descriptive variable names

Descriptive variable names might seem like an obvious thing but I find myself using generic variable names all the time. Most of the time these variables are temporary and never used outside of that single code block, but even with temporary variables it is always good to put an explanation of what values they contain.
Below is an example of variable names that are mostly descriptive.

for x in `cat $1`
do
    NEW_UID=$(echo $x | cut -d: -f1)
    NEW_USER=$(echo $x | cut -d: -f2)
 
While it might be pretty obvious what goes into $NEW_UID and $NEW_USER it is not necessarily obvious what the value of $1 is, or what is being set as $x. A more descriptive way of writing this same code can be seen below.

INPUT_FILE=$1
for CSV_LINE in `cat $INPUT_FILE`
do
  NEW_UID=$(echo $CSV_LINE | cut -d: -f1)
  NEW_USER=$(echo $CSV_LINE | cut -d: -f2)
 
With the rewritten block of code it is very apparent that we are reading an input file and that file is a CSV file. It is also more apparent where we are getting the new UID and new USER information to store in the $NEW_UID and $NEW_USER variables.
The exmaple above might seem like a bit of overkill but someone may thank you later for taking a little extra time to be more descriptive with your variables.

Use $(command) for command substitution

If you want to create a variable that's value is derived from another command there are two ways to do it in bash. The first is to wrap the command in back-ticks such as the example below.

DATE=`date +%F`
 
The second method uses a different syntax.

DATE=$(date +%F)
 
While both are technically correct, I personally prefer the second method. This is purely personal prefrence, but in general I think that the $(command) syntax is more obvious than using back-ticks. Let's say for example you are digging through hundreds of lines of bash code; you may find as you read and read that sometimes those back-ticks start looking like single quotes. On top of that, sometimes a single quote tends to look like a back-tick. At the end of the day, it all comes down to preference. So use what works best for you; just make sure you are being consistent with the method you choose to use.

Before you exit on error describe the problem

We have gone though several examples of items that make it easier to read and understand code, but this last one is useful before the troubleshooting process even gets to that point. By adding descriptive errors in your scripts you can save someone a lot troubleshooting time early on. Let's take a look at the following code and see how we can make it more descriptive.

if [ -d $FILE_PATH ]
then
  for FILE in $(ls $FILE_PATH/*)
  do
    echo "This is a file: $FILE"
  done
else
  exit 1
fi
 
The first thing this script does is check if the value of the $FILE_PATH variable is a directory, if it isn't it will exit with a code of 1 which denotes an error. While it's great that we used an exit code that will tell other scripts that this script was not successful, it doesn't explain that to the humans running this script.
Let's make the code a little more human friendly.

if [ -d $FILE_PATH ]
then
  for FILE in $(ls $FILE_PATH/*)
  do
    echo "This is a file: $FILE"
  done
else
  echo "exiting... provided file path does not exist or is not a directory"
  exit 1
fi
 
If you were to run the first snippet, you would expect a huge amount of output. If you didn't get that output you would have to open the script up to see what could have possibly gone wrong. If you were to run the second code snippet however, you would know instantly that the path you gave the script wasn't valid. Adding just one line of code can save a lot of troubleshooting later.
The above examples are just a few things I try to use whenever I write scripts. I'm sure there are other great tips for writing clean and readable bash scripts, if you have any feel free to drop them in the comments box. It's always good to see what tricks others come up with.

MySQL: tee saving your output to a file

Tee is a unix command that takes the standard out output of a Unix command and writes it to both your terminal and a file. Until recently I never knew there was a MySQL client command that performed the same function. Today I will show you an example of how to use it.

First login to the MySQL CLI (command line interface)
 $ mysql -uroot -p
 Enter password:
 Welcome to the MySQL monitor.  Commands end with ; or g.
 Your MySQL connection id is 24839

Once you are in you will have a command prompt. From there simply type tee and the file path and name you want to save to.
 mysql> tee /var/tmp/mysql_tee.out
 Logging to file '/var/tmp/mysql_tee.out'
 mysql> use mysql;
 Reading table information for completion of table and column names
 You can turn off this feature to get a quicker startup with -A

 Database changed
 mysql> show tables;
 +---------------------------+
 | Tables_in_mysql           |
 +---------------------------+
 | columns_priv              |
 | db                        |
 | event                     |
 | func                      |
 | general_log               |
 | help_category             |
 | help_keyword              |
 | help_relation             |
 | help_topic                |
 | host                      |
 | ndb_binlog_index          |
 | plugin                    |
 | proc                      |
 | procs_priv                |
 | servers                   |
 | slow_log                  |
 | tables_priv               |
 | time_zone                 |
 | time_zone_leap_second     |
 | time_zone_name            |
 | time_zone_transition      |
 | time_zone_transition_type |
 | user                      |
 +---------------------------+
 23 rows in set (0.00 sec)
 
Now just check your file and make sure your commands were logged.
 $ ls -la /var/tmp/mysql_tee.out
 -rw-r--r-- 1 ubuntu ubuntu 1030 2011-12-22 01:43 /var/tmp/mysql_tee.out

IPtables: Linux firewall rules for a basic Web Server

What is iptables?

iptables is a package and kernel module for Linux that uses the netfilter hooks within the Linux kernel to provide filtering, network address translation, and packet mangling. iptables is a powerful tool for turning a regular Linux system into a simple or advanced firewall.

Firewall & iptables basics

Rules are first come first serve

In iptables much like other (but not all) firewall filtering packages the rules are presented in a list. When a packet is being processed, iptables will read through its rule-set list and the first rule that matches this packet completely gets applied.
For example if our rule-set looks like below, all HTTP connections will be denied:
  1. Allow all SSH Connections
  2. Deny all connections
  3. Allow all HTTP Connections
If the packet was for SSH it would be allowed because it matches rule #1, HTTP traffic on the other hand would be denied because it matches both rule #2 and rule #3. Because rule #2 says Deny all connections the HTTP traffic would be denied.
This is an example of why order matters with iptables, keep this in mind as we will see this later in this article.

Two policies for filtering

When creating filtering rule-sets there are two methods explicit allow or explicit deny.

Explicit Allow (Default Deny)

Explicit Allow uses a default deny policy, when creating these types of rules the base policy is a deny all rule. Once you have a base policy that denies all traffic you must add rules (in the proper order) that explicitly allow access.
Explicit Allow is by default more secure than Explicit Deny, because only traffic that you explicitly allow is allowed into the system. The trade off, however is that you must manage a sometimes complicated rule set. If the rule set is not correct you may even end up locking yourself out of the system. It is not that uncommon to lock yourself out of your system when creating a explicit allow policy.

Explicit Deny (Default Allow)

Explicit Deny uses a default allow policy, when creating these types of rules the base policy is to allow all. Once you have a base policy that allows all traffic you can create additional rules that explicitly deny access.
When using Explicit Deny only traffic that you specifically block is denied, this means that your system is more vulnerable if you forget to block a port that should be blocked. However this also means that you are less likely to lock yourself out of the system.
Because we are creating a filtering released for a public facing web server we will be covering a Explicit Allow policy, as it is more secure and will show you some best practices when creating iptables rules.

Chains and Tables

Tables

Tables are used by iptable to group rules based on their function. The 3 table types available are Filter, NAT, and Mangle.
  • Filter - The filter table is used to filter network traffic based on allow or deny rules the user has put in place. This is the default table to use if no tables were given in a command.
  • NAT - The NAT Table is used for Network Address Translation. This is useful for configuring your Linux System as a router or port forwarding.
  • Mangle - Mangle is used for packet mangling, this is mainly useful for changing the TOS or TTL on packets
For our examples today we are focusing on only the filter table, which is also the default. When running iptables commands we will not need to specify the table we are using in our examples.

Chains

Chains on the other hand are used to organize rules based on the source of the packets. Filter has 3 default chains INPUT, OUTPUT, and FORWARD these chains are used to separate the packets based on the source of the packet.
  • INPUT - This chain is used for packets that are incoming on the system from an outside source.
  • OUTPUT - This chain is used for packets that are outgoing from the system to an outside source.
  • FORWARD - This chain is used for packets that are being forwarded through NAT rules, this allows you to filter traffic that is also NAT'ed
In addition to the chains above custom chains can be created as well, placing custom chains in can allow for packets from specific sources to match faster without being run through unnecessary rules.
We will save creating custom chains for another day as you do not really need them for a basic web server.

Creating a rule set for a basic web server

Verify iptables is installed

Before starting to write iptables rules; lets validate that the package is installed and the iptables module is loaded.

Verifying the package is installed (Ubuntu/Debian)

 # dpkg --list | grep iptables
 ii iptables 1.4.12-1ubuntu4 administration tools for packet filtering and NAT

Verifying the Kernel Module is loaded

 # lsmod | grep ip_tables
 ip_tables 18106 1 iptable_filter
Once you have validated that iptables is loaded and ready, we can start creating rules.

Creating iptables rules

When implementing iptables there are two methods. Rules can be added to a file that gets loaded on reboot/restart of the iptables services, or rules can be added live and then saved to the file that is loaded on reboot/restart of iptables. For today's examples I will show adding the rules live and saving them to a file once complete.
I prefer this method as it provides me with the ability to simply reboot the system and regain access if I mess up any rules. While this may work for virtual machines and physical machines that are close by, this is always a concern when applying firewall rules to remote machines that you do not have quick access to.
Always triple check your rules before applying them. As a best practice if you are applying the rules to a remote machine it is best to test your rules on a machine that is close by first.

Show current rules

Before we start creating rules I want to cover how to show rules that are currently being enforced. We will use this command often to verify whether our rules are in place or not. The below command shows no rules are in place.
 # iptables -L
 Chain INPUT (policy ACCEPT)
  target prot opt source destination

 Chain FORWARD (policy ACCEPT)
  target prot opt source destination

 Chain OUTPUT (policy ACCEPT)
  target prot opt source destination

Add port 22, Before you add the base policy

No matter if I am adding a explicit deny or explicit allow base policy I always add a rule that allows me access to port 22 (SSH) as the first rule. This allows me to ensure that SSH is always available to me even if I mess up the other rules.
Since my SSH connections are coming from the IP 192.168.122.1 I will explicitly allow all port 22 connections from the IP 192.168.122.1.
 # iptables -I INPUT -p tcp --dport 22 -s 192.168.122.1 -j ACCEPT
 # iptables -L
 Chain INPUT (policy ACCEPT)
 target prot opt source destination
 ACCEPT tcp -- 192.168.122.1 anywhere tcp dpt:ssh
Lets break this command down a bit, and get a better understanding of iptables syntax.
-I INPUT
The -I flag tells iptables to take this rule and insert it into the INPUT chain. This will put the given rule as the first rule in the set; there are other methods of adding rules such as appending the rule or adding the rule into a specific line that we will show in the next few examples.
-p tcp
The -p flag is part of the rule that tells iptables whatprotocol that we want to match on, in this case it is TCP.
--dport 22
The --dport flag is short for destination port and is used to specify that we are only looking to match on traffic destined for port 22.
-s 192.168.122.1
The -s flag as you can probably guess is used to specify the source of the traffic, in our case the source of our port 22 traffic will be 192.168.122.1.As a note you must be careful to ensure you have a proper source when applying this rule.
-j ACCEPT
The -j flag tells iptables to jump to the action for this packet, these are known in iptables as targets. A target can be a chain or one of the predefined targets within iptables, the target in our example will tell iptables to ACCEPT the packet and allow it to pass through the firewall.
If you have a dynamic IP the IP address that you are trying to access the server from may change. If that is the case it may be preferable to leave out the source which would allow all port 22 traffic.
Example to allow SSH traffic from all sources:
 # iptables -I INPUT -p tcp --dport 22 -j ACCEPT

Changing the base policy to default deny (Explicit Allow)

Now that we have the SSH traffic allowed we can change the base policy of the INPUT chain to deny all, this means that anything that does not match any of our rules will be denied. In some filtering systems this needs to be done by adding a rule at the end of the set that denies everything. In iptables however this can be changed by changing the INPUT chains default policy.
 # iptables -P INPUT DROP
 # iptables -L
 Chain INPUT (policy DROP)
 target prot opt source destination
 ACCEPT tcp -- 192.168.122.1 anywhere tcp dpt:ssh
The -P flag stands for policy, in this command we are setting the default policy of the INPUT chain to the target DROP.
DROP vs REJECT
Within the default policy I showed using the DROP target. There are two targets in iptables that can deny a packet, DROP and REJECT. There is a very small but possibly very impactful difference in these two targets.
The REJECT target will send a reply icmp packet to the source system telling that system that the packet has been rejected. By default the message will be "port is unreachable".
The DROP target simply drops the packet without sending any reply packets back.
The REJECT target is vulnerable to DoS style attacks as every packet that is rejected will cause iptables to send back an icmp reply, when an attack is at volume this causes your system to also send icmp replies in volume. In this scenario it is better to simply drop the packet and reduce traffic congestion as best you can.
For this reason I suggest using DROP as a default reply.

Appending our web server rules

Now that we have the default policy and SSH rules in place we need to add rules for the other services we want to make accessible. The first rule we will add will accept and allow all traffic destined to port 80 and port 443 for web traffic.
 # iptables -A INPUT -p tcp --dport 80 -j ACCEPT
 # iptables -A INPUT -p tcp --dport 443 -j ACCEPT
 # iptables -L -n
 Chain INPUT (policy DROP)
 target prot opt source destination
 ACCEPT tcp -- 192.168.122.1 0.0.0.0/0 tcp dpt:22
 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80
 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:443
As you can see in the above example we added the rule by using -A INPUT. While -I is used to insert a rule into the beginning or a chains rule-set, the -A flag is used to append a rule at the end of the rule-set.

Allowing Loopback Traffic

In addition to allowing web server traffic we should also allow all traffic on the loopback interface, this is necessary if you are running a local MySQL server and connecting via localhost; but also a good idea in general for the loopback interface.
 # iptables -I INPUT -i lo -j ACCEPT
 # iptables -L -n
 Chain INPUT (policy DROP)
 target prot opt source destination
 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
 ACCEPT tcp -- 192.168.122.1 0.0.0.0/0 tcp dpt:22
We can tell iptables to allow all traffic on a specific interface with the -i flag, and because we didn't add -p the default action is to allow all protocols.
I placed this rule first in the chain because in general there is quite a bit of traffic on the loopback interface and having the rule earlier in the ruleset provides faster matching and allows the kernel to spend less time trying to match the packets.

Allowing FTP Traffic

FTP traffic can be tricky when it comes to firewalls; that is because FTP uses multiple ports when transferring data and sometimes not the same port.
Active mode FTP uses port 21 for a control channel and port 20 for data. This is easily enough accounted for with the first two iptables rules in the example below.
Passive mode FTP however is not as simple, once a client has connected to the control port the FTP server, the FTP client can establish a higher port for the client to connect to. Because this higher port is often within a large range it is difficult to open up the entire range without possibly allowing malicious traffic to unwanted ports.
For this scenario iptables uses another module called ip_conntrack; ip_conntrack tracks established connections and allows iptables to create rules that allows related connections to be accepted.
This allows for the FTP connection to establish on port 21 with the first rule in the list and then establish a connection with a higher port via the third rule.
 # iptables -A INPUT -p tcp --dport 21 -j ACCEPT
 # iptables -A INPUT -p tcp --dport 20 -j ACCEPT
 # iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
 # iptables -L -n
 Chain INPUT (policy DROP)
 target prot opt source destination
 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:21
 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:20
 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
In the third rule being applied the -m flag stands for match, this allows iptables to match connections based on the connection state (-ctstate) ESTABLISHED or RELATED.
RELATED is a connection type within iptables that matches new connections that are related to already established connections. This is really mainly used with FTP based traffic.
You may also want to restrict FTP traffic to a certain network, to do this simply add the -s flag and the appropriate ip/ip range to the first and second iptables rules.
Loading the ip_conntrack_ftp module
In order for conntrack to work properly you must ensure that the ip_conntrack_ftp module is loaded.
 # modprobe ip_conntrack_ftp
 # lsmod | grep conntrack
 nf_conntrack_ftp 13452 0
 nf_conntrack_ipv4 19716 1
 nf_defrag_ipv4 12729 1 nf_conntrack_ipv4
 xt_conntrack 12760 1
 nf_conntrack 81926 4 nf_conntrack_ftp,xt_state,nf_conntrack_ipv4,xt_conntrack
 x_tables 29846 7 xt_state,ip6table_filter,ip6_tables,xt_tcpudp,xt_conntrack,iptable_filter,ip_tables

Allowing DNS Traffic

DNS based traffic is not as tricky as FTP however DNS traffic primarily uses UDP rather than TCP. However some DNS traffic can be over TCP traffic. In order to allow DNS traffic you must specify 2 iptables commands one opening the port for TCP and the other opening the port for UDP.
 # iptables -A INPUT -p tcp --dport 53 -j ACCEPT
 # iptables -A INPUT -p udp --dport 53 -j ACCEPT
 # iptables -L -n
 Chain INPUT (policy DROP)
 target prot opt source destination
 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:53
 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:53

Blocking IP's

Sometimes the internet is not as friendly as one would like, eventually you may need to block a specific IP address or range of IP's. For this example we are going to block the IP range of 192.168.123.0/24.
 # iptables -I INPUT 3 -s 192.168.123.0/24 -j DROP
 # iptables -L -n
 Chain INPUT (policy DROP)
 target prot opt source destination
 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
 ACCEPT tcp -- 192.168.122.1 0.0.0.0/0 tcp dpt:22
 DROP all -- 192.168.123.0/24 0.0.0.0/0
 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:80
The above rule is a little trickier than others because we already have a rule that accepts all traffic on port 80.
Our rule is to block all traffic from the IP range 192.168.123.0/24; in order for this to be blocked it must be listed in a specific spot within our ruleset. With iptables the first rule that matches a packet will be applied. For our rule to work we must add it before the port 80 rule otherwise the iprange will still be able to connect to port 80.
The number 3 after INPUT is a specification that tells iptables to place the rule 3rd in the list. Because we didn't specify any protocol this rule will block all protocols from the specified ip range.

Saving the rules for reboot

Each distribution of Linux has a different method for saving and restoring the iptables rules for reboot. In my opinion Red Hat variants have the best default.

Red Hat

In the Red Hat distributions you can save the current live iptables rules by using the init script. This saves the rules into the file /etc/sysconfig/iptables. This file is later read by the init script during reboot of the server or a simple restart given to the init script.
Saving your active rules
 # /etc/init.d/iptables save
Loading ip_conntrack on boot
In order to load the ip_conntrack_ftp module on boot you will need to edit the /etc/sysconfig/iptables-config file. This file is used by Red Hat's init script to load any related modules on boot.
 # vi /etc/sysconfig/iptables-config
Modify to add the following
 IPTABLES_MODULES=ip_conntrack_netbios_ns ip_conntrack ip_conntrack_ftp

Ubuntu/Debian

Debain based distributions which includes Ubuntu, do not offer such init scripts by default. There is however a package called iptables-persistent that provides an init script for Ubuntu/Debian variants with similar functionality as the Red Hat version.
Installing iptables-persistent
You can install this package with apt-get
 # apt-get install iptables-persistent
Saving your active rules
Once iptables-persistent is installed it will automatically save your current configuration into /etc/iptables/rules.v4. Any future modifications however will not be automatically saved, from then on you must save the rules similar to the Red Hat version.
 # /etc/init.d/iptables-persistent save
  * Saving rules...
  * IPv4...
  * IPv6...
  ...done.

 # cat /etc/iptables/rules.v4
 # Generated by iptables-save v1.4.12 on Mon Sep 17 05:56:17 2012
 *filter
 :INPUT DROP [0:0]
 :FORWARD ACCEPT [0:0]
 :OUTPUT ACCEPT [42:4296]
 -A INPUT -i lo -j ACCEPT
 -A INPUT -s 192.168.122.1/32 -p tcp -m tcp --dport 22 -j ACCEPT
 -A INPUT -s 192.168.123.0/24 -j DROP
 -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
 -A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
 -A INPUT -p tcp -m tcp --dport 21 -j ACCEPT
 -A INPUT -p tcp -m tcp --dport 20 -j ACCEPT
 -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 -A INPUT -p tcp -m tcp --dport 53 -j ACCEPT
 -A INPUT -p udp -m udp --dport 53 -j ACCEPT
 COMMIT
 # Completed on Mon Sep 17 05:56:17 2012
Making sure iptables-persistent is started on boot
Once your rules are saved, iptables-persistent will start them on boot. To verify that the script is added properly simply check that it exists in /etc/rc2.d/.
 # runlevel
 N 2

 # ls -la /etc/rc2.d/ | grep iptables
 lrwxrwxrwx 1 root root 29 Sep 16 21:44 S37iptables-persistent -> ../init.d/iptables-persistent
Loading ip_conntrack on boot
For Debian the iptables-persistent package does not include a iptables-config file. You could add this module into the init script itself to ensure it is loaded before iptables starts.
 # vi /etc/init.d/iptables-persistent
Go to line #25 and add:
 /sbin/modprobe -q ip_conntrack_ftp

10 nmap Commands

Discover IP's in a subnet (no root)

 $ nmap -sP 192.168.0.0/24
 Starting Nmap 5.21 ( http://nmap.org ) at 2013-02-24 09:37 MST
 Nmap scan report for 192.168.0.1
 Host is up (0.0010s latency).
 Nmap scan report for 192.168.0.95
 Host is up (0.0031s latency).
 Nmap scan report for 192.168.0.110
 Host is up (0.0018s latency).
This is one of the simplest uses of nmap. This command is commonly refereed to as a "ping scan", and tells nmap to send an icmp echo request, TCP SYN to port 443, TCP ACK to port 80 and icmp timestamp request to all hosts in the specified subnet. nmap will simply return a list of ip's that responded. Unlike many nmap commands this particular one does not require root privileges, however when executed by root nmap will also by default send arp requests to the subnet.

Scan for open ports (no root)

 $ nmap 192.168.0.0/24
 Starting Nmap 5.21 ( http://nmap.org ) at 2013-02-24 09:23 MST
 Nmap scan report for 192.168.0.1
 Host is up (0.0043s latency).
 Not shown: 998 closed ports
 PORT STATE SERVICE
 80/tcp open http
 443/tcp open https
This scan is the default scan for nmap and can take some time to generate. With this scan nmap will attempt a TCP SYN connection to 1000 of the most common ports as well as an icmp echo request to determine if a host is up. nmap will also perform a DNS reverse lookup on the identified ip's as this can sometimes be useful information.

Identify the Operating System of a host (requires root)

 # nmap -O 192.168.0.164
 Starting Nmap 5.21 ( http://nmap.org ) at 2013-02-24 09:49 MST
 Nmap scan report for 192.168.0.164
 Host is up (0.00032s latency).
 Not shown: 996 closed ports
 PORT STATE SERVICE
 88/tcp open kerberos-sec
 139/tcp open netbios-ssn
 445/tcp open microsoft-ds
 631/tcp open ipp
 MAC Address: 00:00:00:00:00:00 (Unknown)
 Device type: general purpose
 Running: Apple Mac OS X 10.5.X
 OS details: Apple Mac OS X 10.5 - 10.6 (Leopard - Snow Leopard) (Darwin 9.0.0b5 - 10.0.0)
 Network Distance: 1 hop
With the -O option nmap will try to guess the targets operating system. This is accomplished by utilizing information that nmap is already getting through the TCP SYN port scan. This is usually a best guess but can actually be fairly accurate. The operating system scan however does require root privileges.

Identify Hostnames (no root)

 $ nmap -sL 192.168.0.0/24
 Starting Nmap 5.21 ( http://nmap.org ) at 2013-02-24 09:59 MST
 Nmap scan report for 192.168.0.0
 Nmap scan report for router.local (192.168.0.1)
 Nmap scan report for fakehost.local (192.168.0.2)
 Nmap scan report for another.fakehost.local (192.168.0.3)
This is one of the most subtle commands of nmap, the -sL flag tells nmap to do a simple DNS query for the specified ip. This allows you to find hostnames for all of the ip's in a subnet without having send a packet to the individual hosts themselves.
Hostname information can tell you a lot more about a network than you would think, for instance if you labeled your Active Directory Servers with ads01.domain.com you shouldn't be surprised if someone guesses its use.

TCP Syn and UDP Scan (requires root)

 # nmap -sS -sU -PN 192.168.0.164
 Starting Nmap 5.21 ( http://nmap.org ) at 2013-02-24 13:25 MST
 Nmap scan report for 192.168.0.164
 Host is up (0.00029s latency).
 Not shown: 1494 closed ports, 496 filtered ports
 PORT STATE SERVICE
 88/tcp open kerberos-sec
 139/tcp open netbios-ssn
 445/tcp open microsoft-ds
 631/tcp open ipp
 88/udp open|filtered kerberos-sec
 123/udp open ntp
 137/udp open netbios-ns
 138/udp open|filtered netbios-dgm
 631/udp open|filtered ipp
 5353/udp open zeroconf
The TCP SYN and UDP scan will take a while to generate but is fairly unobtrusive and stealthy. This command will check about 2000 common tcp and udp ports to see if they are responding. When you use the -Pn flag this tells nmap to skip the ping scan and assume the host is up. This can be useful when there is a firewall that might be preventing icmp replies.

TCP SYN and UDP scan for all ports (requires root)

 # nmap -sS -sU -PN -p 1-65535 192.168.0.164
 Starting Nmap 5.21 ( http://nmap.org ) at 2013-02-24 10:18 MST
 Nmap scan report for 192.168.0.164
 Host is up (0.00029s latency).
 Not shown: 131052 closed ports
 PORT STATE SERVICE
 88/tcp open kerberos-sec
 139/tcp open netbios-ssn
 445/tcp open microsoft-ds
 631/tcp open ipp
 17500/tcp open unknown
 88/udp open|filtered kerberos-sec
 123/udp open ntp
 137/udp open netbios-ns
 138/udp open|filtered netbios-dgm
 631/udp open|filtered ipp
 5353/udp open zeroconf
 17500/udp open|filtered unknown
 51657/udp open|filtered unknown
 54658/udp open|filtered unknown
 56128/udp open|filtered unknown
 57798/udp open|filtered unknown
 58488/udp open|filtered unknown
 60027/udp open|filtered unknown
This command is the same as above however by specifying the full port range from 1 to 65535 nmap will scan to see if the host is listening on all available ports. You can use the port range specification on any scan that performs a port scan.

TCP Connect Scan (no root)

 $ nmap -sT 192.168.0.164
 Starting Nmap 5.21 ( http://nmap.org ) at 2013-02-24 12:48 MST
 Nmap scan report for 192.168.0.164
 Host is up (0.0014s latency).
 Not shown: 964 closed ports, 32 filtered ports
 PORT STATE SERVICE
 88/tcp open kerberos-sec
 139/tcp open netbios-ssn
 445/tcp open microsoft-ds
 631/tcp open ipp
This command is similar to the TCP SYN scan however rather than sending a SYN packet and reviewing the headers it will ask the OS to establish a TCP connection to the 1000 common ports.

Aggressively Scan Hosts (no root)

 $ nmap -T4 -A 192.168.0.0/24
 Nmap scan report for 192.168.0.95
 Host is up (0.00060s latency).
 Not shown: 996 closed ports
 PORT STATE SERVICE VERSION
 22/tcp open ssh OpenSSH 5.9p1 Debian 5ubuntu1 (protocol 2.0)
 | ssh-hostkey: 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:6c (DSA)
 |_2048 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:6c (RSA)
 80/tcp open http nginx 1.1.19
 |_http-title: 403 Forbidden
 |_http-methods: No Allow or Public header in OPTIONS response (status code 405)
 111/tcp open rpcbind
 | rpcinfo:
 | program version port/proto service
 | 100000 2,3,4 111/tcp rpcbind
 | 100000 2,3,4 111/udp rpcbind
 | 100003 2,3,4 2049/tcp nfs
 | 100003 2,3,4 2049/udp nfs
 | 100005 1,2,3 46448/tcp mountd
 | 100005 1,2,3 52408/udp mountd
 | 100021 1,3,4 35394/udp nlockmgr
 | 100021 1,3,4 57150/tcp nlockmgr
 | 100024 1 49363/tcp status
 | 100024 1 51515/udp status
 | 100227 2,3 2049/tcp nfs_acl
 |_ 100227 2,3 2049/udp nfs_acl
 2049/tcp open nfs (nfs V2-4) 2-4 (rpc #100003)
 Service Info: OS: Linux; CPE: cpe:/o:linux:kernel
Unlike some of the earlier commands this command is very aggressive and very obtrusive. The -A simply tells nmap to perform OS checking and version checking. The -T4 is for the speed template, these templates are what tells nmap how quickly to perform the scan. The speed template ranges from 0 for slow and stealthy to 5 for fast and obvious.

Fast Scan (no root)

 $ nmap -T4 -F 192.168.0.164
 Starting Nmap 6.01 ( http://nmap.org ) at 2013-02-24 12:49 MST
 Nmap scan report for 192.168.0.164
 Host is up (0.00047s latency).
 Not shown: 96 closed ports
 PORT STATE SERVICE
 88/tcp open kerberos-sec
 139/tcp open netbios-ssn
 445/tcp open microsoft-ds
 631/tcp open ipp
This scan limits the scan to the most common 100 ports, if you simply want to know some potential hosts with ports open that shouldn't be this is a quick and dirty command to use.

Verbose

 $ nmap -T4 -A -v 192.168.0.164
 Starting Nmap 6.01 ( http://nmap.org ) at 2013-02-24 12:50 MST
 NSE: Loaded 93 scripts for scanning.
 NSE: Script Pre-scanning.
 Initiating Ping Scan at 12:50
 Scanning 192.168.0.164 [2 ports]
 Completed Ping Scan at 12:50, 0.00s elapsed (1 total hosts)
 Initiating Parallel DNS resolution of 1 host. at 12:50
 Completed Parallel DNS resolution of 1 host. at 12:50, 0.01s elapsed
 Initiating Connect Scan at 12:50
 Scanning 192.168.0.164 [1000 ports]
 Discovered open port 139/tcp on 192.168.0.164
 Discovered open port 445/tcp on 192.168.0.164
 Discovered open port 88/tcp on 192.168.0.164
 Discovered open port 631/tcp on 192.168.0.164
 Completed Connect Scan at 12:50, 5.22s elapsed (1000 total ports)
 Initiating Service scan at 12:50
 Scanning 4 services on 192.168.0.164
 Completed Service scan at 12:51, 11.00s elapsed (4 services on 1 host)
 NSE: Script scanning 192.168.0.164.
 Initiating NSE at 12:51
 Completed NSE at 12:51, 12.11s elapsed
 Nmap scan report for 192.168.0.164
 Host is up (0.00026s latency).
 Not shown: 996 closed ports
 PORT STATE SERVICE VERSION
 88/tcp open kerberos-sec Mac OS X kerberos-sec
 139/tcp open netbios-ssn Samba smbd 3.X (workgroup: WORKGROUP)
 445/tcp open netbios-ssn Samba smbd 3.X (workgroup: WORKGROUP)
 631/tcp open ipp CUPS 1.4
 | http-methods: GET HEAD OPTIONS POST PUT
 | Potentially risky methods: PUT
 |_See http://nmap.org/nsedoc/scripts/http-methods.html
 | http-robots.txt: 1 disallowed entry
 |_/
 Service Info: OS: Mac OS X; CPE: cpe:/o:apple:mac_os_x
 
By adding verbose to a majority of the commands above you get a better insight into what nmap is doing; for some scans verbosity will provide additional details that the report does not provide.
While these are 10 very useful nmap commands I am sure there are some more handy nmap examples out there. If you have one to add to this list feel free to drop it into a comment.

Monday, September 15, 2014

How to install Squid proxy server



Reasons to Create a Squid Proxy

Two important goals of many small businesses are to:
  • Reduce Internet bandwidth charges
  • Limit access to the Web to only authorized users.
The Squid web caching proxy server can achieve these fairly easily.
Users configure their web browsers to use the Squid proxy server instead of going to the web directly. The Squid server then checks its web cache for the web information requested by the user. It will return any matching information that finds in its cache, and if not, it will go to the web to find it on behalf of the user. Once it finds the information, it will populate its cache with it and also forward it to the user's web browser.
As you can see, this reduces the amount of data accessed from the web. Another advantage is that you can configure your firewall to only accept HTTP web traffic from the Squid server and no one else. Squid can then be configured to request usernames and passwords for each user that users its services. This provides simple access control to the Internet.

Reasons to Create a Squid Reverse Proxy

The Apache web server distributes its load across multiple sister threads. When the number of queries from web surfers gets too high, more httpd short lived thread processes are created to handle the increased connections. The creation of the threads is usually CPU intensive and can make your server sluggish in extreme cases.
Squid can reduce the need to create these threads by aggregating incoming requests from multiple web surfers and convert them into a single stream encapsulated in a single connection.
In the reverse proxy configuration, Squid caches the Apache httpd responses in memory. It will respond with this stored data instead of querying Apache whenever possible.
The combination of caching and reverse proxying makes Squid an asset in reducing load, and increasing the responsiveness of your Apache server.

Download and Install The Squid Package

Most RedHat and Fedora Linux software product packages are available in the RPM format, whereas Debian and Ubuntu Linux use DEB format installation files. When searching for these packages remember that the filename usually starts with the software package name and is followed by a version number, as in squid-3.1.9-3.fc14.i686.rpm. (For help on downloading and installing the package, see Chapter 6, "Installing Linux Software").

Starting Squid

The methodologies vary depending on the variant of Linux you are using as you’ll see next.
Fedora / CentOS / RedHat
With these flavors of Linux you can use the chkconfig command to get squid configured to start at boot:
[root@bigboy tmp]# chkconfig squid on
To start, stop, and restart squid after booting use the service command:
[root@bigboy tmp]# service squid start
[root@bigboy tmp]# service squid stop
[root@bigboy tmp]# service squid restart
To determine whether squid is running you can issue either of these two commands. The first will give a status message. The second will return the process ID numbers of the squid daemons.
[root@bigboy tmp]# service squid status
[root@bigboy tmp]# pgrep spam
Note: Remember to run the chkconfig command at least once to ensure squid starts automatically on your next reboot.
Ubuntu / Debian
With these flavors of Linux the commands are different. Try installing the sysv-rc-conf and sysvinit-utils DEB packages as they provide commands that simplify the process. For help on downloading and installing the packages, see Chapter 6, "Installing Linux Software".) You can use the sysv-rc-conf command to get squid configured to start at boot:
user@ubuntu:~$ sudo sysv-rc-conf squid on
To start, stop, and restart squid after booting the service command is the same:
user@ubuntu:~$ sudo service squid start
user@ubuntu:~$ sudo service squid stop
user@ubuntu:~$ sudo service squid restart
To determine whether squid is running you can issue either of these two commands. The first will give a status message. The second will return the process ID numbers of the squid daemons.
user@ubuntu:~$ sudo service squid status
user@ubuntu:~$ pgrep squid
Note: Remember to run the sysv-rc-conf command at least once to ensure squid starts automatically on your next reboot.

Squid Configuration Files

You can define most of Squid’s configuration parameters in the squid.conf file which may be located in either the /etc or /etc/squid directory depending on your version of Linux.
Remember to restart Squid after you make any changes to your configuration files. This is the only way to activate the new settings.

General Squid Configuration Guidelines

Each Squid server in your administrative zone has to be uniquely identifiable by either its hostname listed in the /etc/hosts file or the value set in the visible_hostname directive in squid.conf. This is especially important in more complex configurations where clusters of Squid servers pool their resources in order to achieve some common caching goal.
Your /etc/hosts file should be configured with your server’s hostname at the end of the localhost line. In this example the server name “bigboy” has been correctly added.
# File: /etc/hosts
127.0.0.1   localhost localhost.localdomain bigboy
If you want to give your Squid process a name that is different from your hostname, then add the visible_hostname directive to your squid.conf file. In this example, we give the server the hostname “cache-001”.
# File: squid.conf
visible_hostname cache-001
Misconfigured Squid instances will give an error like this when the hostname isn’t correctly defined
WARNING: Could not determine this machines public hostname. Please configure one or set 'visible_hostname'.
Now it’s time to configure proxies and reverse proxies.

Configuring Squid Proxies

Squid offers many options to manage the access to the web for security, legal, resource utilization reasons. We’ll cover a few of these in the sections that follow.

Access Control Lists

You can limit users' ability to browse the Internet with access control lists (ACLs). Each ACL line defines a particular type of activity, such as an access time or source network, they are then linked to an http_access statement that tells Squid whether or not to deny or allow traffic that matches the ACL.
Squid matches each Web access request it receives by checking the http_access list from top to bottom. If it finds a match, it enforces the allow or deny statement and stops reading further. You have to be careful not to place a deny statement in the list that blocks a similar allow statement below it. The final http_access statement denies everything, so it is best to place new http_access statements above it
Note: The very last http_access statement in the squid.conf file denies all access. You therefore have to add your specific permit statements above this line. In the chapter's examples, I've suggested that you place your statements at the top of the http_access list for the sake of manageability, but you can put them anywhere in the section above that last line.
Squid has a minimum required set of ACL statements in the ACCESS_CONTROL section of the squid.conf file. It is best to put new customized entries right after this list to make the file easier to read.

Restricting Web Access By Time

You can create access control lists with time parameters. For example, you can allow only business hour access from the home network, while always restricting access to host 192.168.1.23.
#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/24
acl business_hours time M T W H F 9:00-17:00
acl RestrictedHost src 192.168.1.23
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access deny RestrictedHost
http_access allow home_network business_hours
Or, you can allow morning access only:
#
# Add this to the bottom of the ACL section of squid.conf
#
acl mornings time 08:00-12:00
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow mornings

Restricting Access to specific Web sites

Squid is also capable of reading files containing lists of web sites and/or domains for use in ACLs. In this example we create to lists in files named /usr/local/etc/allowed-sites.squid and /usr/local/etc/restricted-sites.squid.
# File: /usr/local/etc/allowed-sites.squid
www.openfree.org
linuxhomenetworking.com
 
# File: /usr/local/etc/restricted-sites.squid
www.porn.com
illegal.com
These can then be used to always block the restricted sites and permit the allowed sites during working hours. This can be illustrated by expanding our previous example slightly.
#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/24
acl business_hours time M T W H F 9:00-17:00
acl GoodSites dstdomain "/usr/local/etc/allowed-sites.squid"
acl BadSites  dstdomain "/usr/local/etc/restricted-sites.squid"
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access deny BadSites
http_access allow home_network business_hours GoodSites

Restricting Web Access By IP Address

You can create an access control list that restricts Web access to users on certain networks. In this case, it's an ACL that defines a home network of 192.168.1.0.
#
# Add this to the bottom of the ACL section of squid.conf
#
acl home_network src 192.168.1.0/255.255.255.0
You also have to add a corresponding http_access statement that allows traffic that matches the ACL:
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow home_network

Password Authentication Using NCSA

You can configure Squid to prompt users for a username and password. Squid comes with a program called ncsa_auth that reads any NCSA-compliant encrypted password file. You can use the htpasswd program that comes installed with Apache to create your passwords. Here is how it's done:
1) Create the password file. The name of the password file should be /etc/squid/squid_passwd, and you need to make sure that it's universally readable.
[root@bigboy tmp]# touch /etc/squid/squid_passwd
[root@bigboy tmp]# chmod o+r /etc/squid/squid_passwd
2) Use the htpasswd program to add users to the password file. You can add users at anytime without having to restart Squid. In this case, you add a username called www:
[root@bigboy tmp]# htpasswd /etc/squid/squid_passwd www
New password:
Re-type new password:
Adding password for user www
[root@bigboy tmp]#
3) Find your ncsa_auth file using the locate command.
[root@bigboy tmp]# locate ncsa_auth
/usr/lib/squid/ncsa_auth
[root@bigboy tmp]#
4) Edit squid.conf; specifically, you need to define the authentication program in squid.conf, which is in this case ncsa_auth. Next, create an ACL named ncsa_users with the REQUIRED keyword that forces Squid to use the NCSA auth_param method you defined previously. Finally, create an http_access entry that allows traffic that matches the ncsa_users ACL entry. Here's a simple user authentication example; the order of the statements is important:
#
# Add this to the auth_param section of squid.conf
#
auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd
 
#
# Add this to the bottom of the ACL section of squid.conf
#
acl ncsa_users proxy_auth REQUIRED
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow ncsa_users
5) This requires password authentication and allows access only during business hours. Once again, the order of the statements is important:
#
# Add this to the auth_param section of squid.conf
#
auth_param basic program /usr/lib/squid/ncsa_auth /etc/squid/squid_passwd
 
#
# Add this to the bottom of the ACL section of squid.conf
#
acl ncsa_users proxy_auth REQUIRED
acl business_hours time M T W H F 9:00-17:00
 
#
# Add this at the top of the http_access section of squid.conf
#
http_access allow ncsa_users business_hours
Remember to restart Squid for the changes to take effect.

Enforcing The Use of Your Squid Forward Proxy Server

If you are using access controls on Squid, you may also want to configure your firewall to allow only HTTP Internet access to only the Squid server. This forces your users to browse the Web through the Squid proxy.

Making Your Squid Server Transparent To Users

It is possible to limit HTTP Internet access to only the Squid server without having to modify the browser settings on your client PCs. This called a transparent proxy configuration. It is usually achieved by configuring a firewall between the client PCs and the Internet to redirect all HTTP (TCP port 80) traffic to the Squid server on TCP port 3128, which is the Squid server's default TCP port.

Squid Transparent Proxy Configuration

Your first step will be to modify your squid.conf to create a transparent proxy. The procedure is different depending on your version of Squid.
Prior to version 2.6: In older versions of Squid, transparent proxy was achieved through the use of the httpd_accel options which were originally developed for http acceleration. In these cases, the configuration syntax would be as follows:
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
Version 2.6 to 3.0: These versions versions of Squid simply require you to add the word "transparent" to the default "http_port 3128" statement. In this example, Squid not only listens on TCP port 3128 for proxy connections, but will also do so in transparent mode.
http_port 3128 transparent
Version 3.1+: Newer versions of Squid also add the “intercept” keyword to the "http_port 3128" statement when transparent proxying uses an HTTP redirect. If redirection isn’t being used the “transparent” keyword is still used. Here is an example:
http_port 3128 intercept
Or
http_port 3128 transparent
Note: Remember to restart Squid for the changes to take effect

Configuring iptables to Support the Squid Transparent Proxy

The examples below are based on the discussion of Linux iptables in Chapter 14, "Linux Firewalls Using iptables". Additional commands may be necessary for you particular network topology.
In both cases below, the firewall is connected to the Internet on interface eth0 and to the home network on interface eth1. The firewall is also the default gateway for the home network and handles network address translation on all the network's traffic to the Internet.
Only the Squid server has access to the Internet on port 80 (HTTP), because all HTTP traffic, except that coming from the Squid server, is redirected.
Squid Server and Firewall – Same Server (HTTP Redirect)
If the Squid server and firewall are the same server, all HTTP traffic from the home network is redirected to the firewall itself on the Squid port of 3128 and then only the firewall itself is allowed to access the Internet on port 80.
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 80 \
        -j REDIRECT --to-port 3128
iptables -A INPUT -j ACCEPT -m state \
        --state NEW,ESTABLISHED,RELATED -i eth1 -p tcp \
        --dport 3128
iptables -A OUTPUT -j ACCEPT -m state \
        --state NEW,ESTABLISHED,RELATED -o eth0 -p tcp \
        --dport 80
iptables -A INPUT -j ACCEPT -m state \
        --state ESTABLISHED,RELATED -i eth0 -p tcp \
        --sport 80
iptables -A OUTPUT -j ACCEPT -m state \
        --state ESTABLISHED,RELATED -o eth1 -p tcp \
        --sport 80
Note: This example is specific to HTTP traffic. You won't be able to adapt this example to support HTTPS web browsing on TCP port 443, as that protocol specifically doesn't allow the insertion of a "man in the middle" server for security purposes. One solution is to add IP masquerading statements for port 443, or any other important traffic, immediately after the code snippet. This will allow non HTTP traffic to access the Internet without being cached by Squid.
Squid Server and Firewall – Different Servers
If the Squid server and firewall are different servers, the statements are different. You need to set up iptables so that all connections to the Web, not originating from the Squid server, are actually converted into three connections; one from the Web browser client to the firewall and another from the firewall to the Squid server, which triggers the Squid server to make its own connection to the Web to service the request. The Squid server then gets the data and replies to the firewall which then relays this information to the Web browser client. The iptables program does all this using these NAT statements:
iptables -t nat -A PREROUTING -i eth1 -s ! 192.168.1.100 \
        -p tcp --dport 80 -j DNAT --to 192.168.1.100:3128
iptables -t nat -A POSTROUTING -o eth1 -s 192.168.1.0/24 \
        -d 192.168.1.100 -j SNAT --to 192.168.1.1
iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.1.100 \
        -i eth1 -o eth1 -m state 
         --state NEW,ESTABLISHED,RELATED \
        -p tcp --dport 3128 -j ACCEPT
 iptables -A FORWARD -d 192.168.1.0/24 -s 192.168.1.100 \
        -i eth1 -o eth1 -m state --state ESTABLISHED,RELATED \
        -p tcp --sport 3128 -j ACCEPT
In the first statement all HTTP traffic from the home network except from the Squid server at IP address 192.168.1.100 is redirected to the Squid server on port 3128 using destination NAT. The second statement makes this redirected traffic also undergo source NAT to make it appear as if it is coming from the firewall itself. The FORWARD statements are used to ensure the traffic is allowed to flow to the Squid server after the NAT process is complete. The unusual feature is that the NAT all takes place on one interface; that of the home network (eth1).
You will additionally have to make sure your firewall has rules to allow your Squid server to access the Internet on HTTP TCP port 80 as covered in Chapter 14, "Linux Firewalls Using iptables".

Manually Configuring Web Browsers To Use Your Squid Server

If you don't have a firewall that supports redirection, then you need to configure your firewall to only accept HTTP Internet access from the Squid server. You will also need to configure your browser's proxy server settings to use the Squid server. The method you use depends on your browser and the process will vary.
Make sure the proxy server used is the IP address or fully qualified domain name (Example: proxy.my-site.com) and that the TCP port to use is 3128, the Squid default.
Your Squid configuration should now be complete.

Configuring Squid Reverse Proxies

This requires configuring both Squid and Apache. Here are the steps you need to follow.

Squid Configuration

Setting up the Squid reverse proxy is easy, but not straight forward. There are a lot of things to take into consideration and we’ll discuss them next. 1. The first step is to use the http_port directive to define some key elements of the configuration.
 http_port 99.184.206.67:80 accel ignore-cc defaultsite=www.my-site.com vhost
First we define the IP address (99.184.206.67) and port (80) on which squid should be running. The ICP protocol, which will be discussed later, uses Cache-Control headers to communicate between inter-related caches. This is not required and the ignore-cc option disables this feature.
If a web browser visits the server using an IP address, or the website isn’t defined later on in the our_sites ACL, then the site defined in the defaultsite option will be displayed. In this case it’s www.my-site.com. As the server is also acting as a virtual host, running multiple websites, we have to mention this too with the vhost option.
Finally, accelerator, or reverse proxy mode is defined with the accel option.
2. Define an ACL for all the sites hosted on your server with the dstdomain option, then allow users to access them using the http_access directive
acl our_sites dstdomain www.my-site.com my-site.com www.my-other-site.com my-other-site.com
http_access allow our_sites
3. Use the cache_peer directive to define the server that is going to cache the content. The syntax is as follows:
# !!! NOTE !!! 
#
# Do not use this line in your configuration
 
cache_peer hostname type http-port icp-port [options]
We will now review each of these command options switches which will be necessary to configure the reverse proxy on your server. Here is the real example.
# Use this line in your configuration
cache_peer 127.0.0.1 parent 80 0 no-query originserver name=myAccel
In this case the cache_peer is Squid running on localhost.
Caches can have parents and children. When a child doesn’t have content to serve, it will refer to the parent as a caching source of last resort using the ICP protocol. In this example, there is only one cache, so we have to define localhost as the parent with the parent keyword. ICP is also disabled with the no-query keyword.
The TCP port on which the localhost Apache origin server will be listening listening is port 80. ICP is further disabled by defining it as listening on port 0.
When a parent cache cannot find its content, it gets its data to replenish its cache with newer content from an origin server, this will be localhost too.
The myAccel name will be used later to define the sites that will be cached on the server.
4. The cache_peer_access directive is now used to allow all the sites listed in the our_sites ACL to be cached and deny all others.
cache_peer_access myAccel allow our_sites
cache_peer_access myAccel deny all
5. Every HTTP request states the IP address of the client in the header. If HTTP request also happens to go through a proxy server, the proxy server will create an additional X-Forwarded-For header to which it will add the remote client’s IP address as well as its own. This command will truncate the list from any such remote clients so that only a single IP address is in the list, the last one of the last proxy server.
forwarded_for truncate
Squid being a proxy server will then add its own IP address (127.0.0.1) to the X-Forwarded-For list. The Apache logging will have to be modified to truncate this IP address. This will be covered later.
Note: If your Apache logs show two IP addresses or more for the clients when using Squid, you probably have left out this forward_for step.
6. Though we have previously disabled ICP, Squid will still send cache related HTTP headers to web clients. This will need to be disabled.
acl localnet src 99.184.206.64/27
via off
reply_header_access X-Cache-Lookup deny !localnet
reply_header_access X-Squid-Error deny !localnet
reply_header_access X-Cache deny !localnet
Here we define the local network as 99.184.206.64/27 and then we disable the headers to the remote users.
7. We don’t want to cache the results. This can cause issues with some static content. For example, if the first access to your website occurs when it is under maintenance there will be a “Closed” page. If this is cached, this page will always be displayed till the cache expires. Disabling caching eliminates this problem.
# Do not allow caching memory structures to be created
memory_pools off
 
# Turn off caching
cache deny all
8. The very last configuration step is to search the remainder of your squid.conf file and comment out the http_port command that may have been configured previously. Remember, Squid would have been configured to listen on its default TCP 3128 port, this was set to TCP port 80 before.
# Squid normally listens to port 3128
#http_port 3128
9. Finally. Shutdown apache and restart Squid. Also make sure Squid will start automatically when you reboot.
The next phase will require configuring Apache to work with the newly activated Squid daemon.

Apache Configuration

You will now have to configure Apache to not only work with Squid, but also to listen on localhost instead of the network interface. Here are the steps to follow.
1. In your Apache conf.d directory add a file with the following commands.
The first section makes sure Apache listens on the localhost address and not that of the network interface. It also makes sure your virtual hosts will do the same.
The next section defines how logging will be handled when X-Forwarded-For entries are present. Notice it will only be used if the combined_squid condition is met. This condition will be defined later.
The final section tells Apache how to handle Squid cache headers. If there is no X-Forwarded-For header (represented by the regular expression ^$) then the logging is assigned the normal_request variable. If there is an X-Forwarded-For header (represented by the regular expression .+) then the logging is assigned the squid_request variable.
#
# Define listening IP addresses
#
 
Listen 127.0.0.1:80
NameVirtualHost 127.0.0.1:80
 
#
# Squid Logging format
#
 
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined_squid
 
#
# Set up how you handle Squid cache headers
#
 
SetEnvIf X-Forwarded-For    "^$"         normal_request
SetEnvIf X-Forwarded-For    ".+"         squid_request
2. The final steps are to first change your VirtualHost entries to refer to 127.0.0.1:80, then we tell Apache to use the combined_squid format if Squid passes the logging data to Apache with the X-Forwarded-For header as flagged when the squid_request variable is set.
<VirtualHost 127.0.0.1:80>
 
    CustomLog   logs/access_log combined_squid env=squid_request
    CustomLog   logs/access_log combined env=normal_request
 
</VirtualHost>
Note: Remember to restart Squid for the changes to take effect.
With this last step your server will be ready to operate your website. It should be noticeably faster and make the user experience more pleasant.

 

Squid Disk Usage

Squid uses the /var/spool/squid directory to store its cache files. High usage squid servers need a large amount of disk space in the /var partition to get optimum performance.
Every webpage and image accessed via the Squid server is logged in the /var/log/squid/access.log file. This can get quite large on high usage servers. Fortunately, the logrotate program automatically purges this file.

Troubleshooting Squid

Squid logs both informational and error messages to files in the /var/log/squid/ directory. It is best to review these files first whenever you have difficulties.The squid.out file can be especially useful as it contains Squids' system errors.
Another source of errors could be unintended statements in the squid.conf file that cause no errors; mistakes in the configuration of hours of access and permitted networks that were forgotten to be added are just two possibilities.
By default, Squid operates on port 3128, so if you are having connectivity problems, you'll need to follow the troubleshooting steps in Chapter 4, "Simple Network Troubleshooting", to help rectify them.
Note: Some of Squid's capabilities go beyond the scope of this book, but you should be aware of them. For example, for performance reasons, you can configure child Squid servers on which certain types of content are exclusively cached. Also, you can restrict the amount of disk space and bandwidth Squid uses.

Conclusion

Tools such as Squid are popular with many company mangers. By caching images and files on a server shared by all, Internet bandwidth charges can be reduced.
Squid's password authentication feature is well liked because it allows only authorized users to access the Internet as a means of reducing usage fees and distractions in the office. Unfortunately, an Internet access password is usually not viewed as a major security concern by most users who are often willing to share it with their colleagues. Although it is beyond the scope of this book, you should consider automatically tying the Squid password to the user's regular login password. This will make them think twice about giving their passwords away. Internet access is one thing, letting your friends have full access to your e-mail and computer files is quite another.