www.trisk.com / Scamper's Homepage
Procmail tips and pointers
by Garen Erdoisa (aka: Scamper)

Some Background and Acknowledgements - for the curious.

  Note: Colored text on this page indicates that there is more information available when you mouse over the text.

  You should be able to copy the rendered version of this page using CTRL-A CTRL-C, then paste it into an editor such as vi or vim, then delete everything outside of the ---cut here--- lines to be able to play around with the file.

     Important! procmail does not like files that have end of line characters that include a CR (Carriage Return). Procmail will normally just skip any such lines and continue on.

Procmail recipes will recognize trailing space characters as part of the regular expression being evaluated, but will strip leading spaces or tabs from any regular expression before evaluating the expression. Any such spaces should be enclosed in parenthesis ( ) if you want to force procmail to treat them as being present in the expression being evaluated. The end of line character in a procmail recipe is just a LF (Line Feed). Procmail regular expressions can be continued on the next line if you terminate the prior line with a backslash "\" character. Some of these rules may also apply to external programs such as awk, sed, grep, or perl that also use regular expressions. For rules pertaining to those programs you should study their man pages or info pages.

On OpenBSD systems using non GNU versions of sed, you may have to use literal control characters in sed regular expression character lists instead of the escaped sequences such as \t <tab> or ^i. In an editor such as vi, you can insert such control characters using a vi diegraph sequences ie: ^v^i which would replace \t in that context, or ^v^m would insert a CR or \r, ^v^j would insert a LF or \n.


Examples using vi or vim as the editor:

GNU vesion of sed that will replace one or more spaces and tabs with single spaces in a line:
 sed -e 's/[ \t]\+/ /g'

Old, Non GNU version of sed that will replace one or more spaces and tabs with single spaces in a line:
 sed -e 's/[ ^v^i]\{1,\}/ /g'

  To run a procmail file in a test mode, set the variables as appropriate then type something like the following on a command line:

    cat /path/to/raw_spam_email.txt |procmail /path/to/test_procmail.rc

---CUT HERE---

# Global procmail definitions

# Define a file for procmail to send it's log information.

# Make sure procmail verbose logging is turned off.

# Define a new line character for use in procmail LOG entries.
# note: the quote spanning two lines below is deliberate.

# Directory where we will store mail folders
# Note: This directory MUST exist!

#Mail folder for incoming whitelisted listmail

# Location of formail on our system. (for use in procmail actions since
# those typically need a shell meta pattern in procmail action lines to work as intended)

# Location of file containing From: addresses of people we correspond with on a regular basis

#Location of a folder containing blacklisted email.

# Uncomment this if you would rather just delete the blacklisted email.

# Location of a file containing regular expressions of patterns that we don't want.
# to see in the Subject: From: or Reply-to: headers

# Location of file containing To: addresses we have given to news letters
# or web sites that map to my real account via sendmail aliases.

# Capture the message ID string (if any) for future reference in log entries.
* ^Message-ID:
{ MESSAGEID=`${FORMAIL} -cx "Message-ID:" |sed -e 's/[ \t]\{1,\}//g'` }

:0 E
{ MESSAGEID='none' }

# Sample procmail recipe to white list mail that was sent to an sendmail alias that is mapped to this account
# that is also listed in our SUBAUTH (Subscription Authorization) file.

* ? test -f ${SUBAUTH} && (${FORMAIL} -cx "To:" -cx "Cc:" |\
     sed -e 's/[ \t]\{1,\}/ /g' |\
     grep -iFf ${SUBAUTH})
|${FORMAIL} -A "X-Folder: Authorized Subscriptions" >>${LISTMAILFOLDER}

# The above procmail recipe uses a procmail feature to execute programs that are external to procmail. It starts by extracting and unfolding the contents of the To: and Cc: headers, then replaces any multiple tabs and spaces with single spaces so as to put the entire header line all onto one line. If those programs return an exit status of zero, then the condition matches, and the action will be taken. The action in this case is to add an X-folder: header onto the headers of the email, then append the email to a folder using an implied lockfile.

# Sample procmail recipe to whitelist mail that was sent from an address in our NOBOUNCE file.

* ? test -f ${NOBOUNCE} && (${FORMAIL} -cx "From:" -cx "Reply-to:" |\
     sed -e 's/[ \t]\{1,\}/ /g' |\
     grep -iFf ${NOBOUNCE})
|${FORMAIL} -A "X-Folder: Locally Authorized Sender" >>${DEFAULT}

# Similar to the above, but tests strings in the From: and Reply-to: headers vs strings in a ${NOBOUNCE} file

# Sample procmail recipe to filter email containing blacklisted patterns in the headers.

* ? test -f ${BLACKLIST_PATTERNS} && (${FORMAIL} -cx "Subject:" -cx "From:" -cx "Reply-to:" |\
     sed -e 's/[ \t]\{1,\}/ /g' |\
     grep -iEf ${BLACKLIST_PATTERNS})
 LOGSTRING="Spam - Found a blacklisted pattern"
 LOG="[$$]$_: ${LOGSTRING}. Email delivered to ${SPAMFOLDER}${NL}"

 :0: ${MAILDIR}/spamfolder.lock

# This recipe is similar to the above whitelist recipes, but instead it searches a file defined in the BLACKLIST_PATTERNS variable for a list of regular expressions, one regular expression per line in that file.

# If any of the text in the Subject: or From: headers match with a regular expression found in that file, then we will execute the code inside the procmail nesting block.

# In that nesting block, we set a LOGSTRING variable, then use that variable in a LOG line, which gets sent to the procmail LOGFILE defined above in the global variables section. Next we file the email into the folder defined in the SPAMFOLDER variable using a literal lockfile, instead of letting procmail derive it's own lockfile. The literal lockfile is necessary in this case because if SPAMFOLDER is set to /dev/null, and the process can not write to the /dev directory, then an error will be generated. So we set a literal lock file in a directory that procmail has permission to use.

# Note: that procmail lockfiles should only be used on delivering procmail recipes where it makes sense to use them.


# Sample procmail recipe to enumerate the Received: headers, and store them
# in the ${RECEIVEDHEAD} variable. Note the backtics that launch an embedded
# shell script.

:0 W
* H ?? 1^1 ^Received:
 RECEIVEDHEAD=`${FORMAIL} -cX "Received:" |\
  cat -n |\
  sed -e 's/[ \t]\{1,\}/ /g ; s/^ // ; s/^[0-9]\{1,\}/&:/'`


# The above recipe will extract the "Received:" headers. Then it will unfold the headers putting each one all on one line, number the lines, eliminate multiple tabs and spaces replacing them with a single space, delete the leading space, then append a colon after the new line numbers. The contents of the RECEIVEDHEAD variable can then be parsed later on in the procmail recipes and you won't have to guess at which received header you are looking at since they now have line numbers prepended. It also makes it easier to do regular expression pattern matching since you don't have to deal with multiple embedded tabs and spaces. This technique also has the advantage of not modifying the headers in the email itself. Note that it is often a requirement of spam reporting services or abuse desks to report emails with the headers left in their original unaltered form.

# Finally the recipe will dump the contents of the variables we just created into the procmail log file for future reference.


# Sample procmail recipe which will extract the IPv4 address from the first
# Received: header. This could be adapted if you have several internal
# servers through which the mail passes.
# Also, the header IP extraction in this recipe is assuming that the header line was
# generated by sendmail. If you are using another server, you may need to adjust
# the regular expression to accommodate that.

# Initialize the SOURCEIP variable

* RECEIVEDHEAD ?? ^1: Received: from .*\(.*\[\/[0-9.]+
 LOG="[$$]$_: Extracted IP ${SOURCEIP} from first Received: header.${NL}"

:0 E
{ LOG="[$$]$_: Failed to find any source IP in the first Received: header.${NL}" }

# Sample procmail recipe which will generate the reverse IPv4 from
# the SOURCEIP, for use in blocklist lookups.
# It will also verify that the number we are looking at is a real Internet
# address.

# Initialize the SOURCEIPREV variable

# Check for valid IPv4 address range.
# Then if the address is not an IANA non-routable address
# generate the reverse IP for use in subsequent DNS lookups.

# Build a procmail style regular expression to test for a valid IPv4 range.

# Build a procmail style regular expression to test for IPv4 ranges that should not be used on the Internet.
# These are based on RFC-3330 Para 3 summary table.
# Note: These expressions should be periodically verified and updated as needed

# Combine the above into one regular expression.
# Note: IP is included in network defined above
#       as part of the CLASSA regular expression variable.

* ! SOURCEIP ?? ^(000\.000\.000\.000)$
 * $ ! SOURCEIP ?? ^${RFC_3330_INVALID}$
  * SOURCEIP ?? ^[0-9]+\.[0-9]+\.[0-9]+\.\/[0-9]+
  { QUAD4=${MATCH} }
  * SOURCEIP ?? ^[0-9]+\.[0-9]+\.\/[0-9]+
  { QUAD3=${MATCH} }
  * SOURCEIP ?? ^[0-9]+\.\/[0-9]+
  { QUAD2=${MATCH} }
  * SOURCEIP ?? ^\/[0-9]+
  { QUAD1=${MATCH} }
  LOG="[$$]$_: IP ${SOURCEIP} is a valid IPv4 address${NL}"
 :0 E
  LOG="[$$]$_: IP ${SOURCEIP} is an IANA Non-Routable IPv4 address${NL}"

:0 E
 LOG="[$$]$_: Error - ${SOURCEIP} has an invalid range for an IPv4 address.${NL}"

# Added this section after a discussion about it came up on comp.mail.misc
# today (5/2/2006). Used something similar to this as an example there.
# Sample procmail recipe that will create a cache file that detects
# duplicate messages, and send any such duplicates to /dev/null

# I suggest at using a buffer size of at least 35k bytes
# retain about 500 lines in the cache file without
# running into problems with procmail truncating the cache
# file. Also note that some versions of procmail limit
# the LINEBUF size to about 35k.


# Define the command to use to generate the message body digest
DIGEST='/usr/bin/openssl sha1'

# Note the backticks here that launch an embedded shell script.
# This creates a string that looks like this:
# <4404ff446015133ca9972023a5b1af9876f788c8@[]>
# ie: <sha1 message body hash@[IPv4]>

MESSAGEBODYDIGEST=`${FORMAIL} -I "" |${DIGEST} |sed -e "s/[ -]\{1,\}//g ; s/^/ </ ; s/$/@[${SOURCEIP}]>/"`

* ? touch ${HOME}/.digestcache
* ? grep -Fx "${MESSAGEBODYDIGEST}" ${HOME}/.digestcache
 LOG="[$$]$_: Found ${MESSAGEBODYDIGEST} cached. Message-ID: ${MESSAGEID}${NL}"

:0 E

 # Keep only the last 500 lines in ${HOME}/.digestcache
 DIGESTCACHE=` echo "${MESSAGEBODYDIGEST}" ; head -qn 499 ${HOME}/.digestcache `

 # Alternate method to keep the last 500 lines if you don't have the "head" command.
 # DIGESTCACHE=`(echo "${MESSAGEBODYDIGEST}" ; cat ${HOME}/.digestcache )|sed -e '501,$d'`

 # Write the updated DIGESTCACHE information back to the cache file.
 :0 Wic: ${HOME}/.digestcache.lock
 |( ${FORMAIL} -X "" -I "" ; echo "${DIGESTCACHE}" ) >${HOME}/.digestcache

 :0 a
 { LOG="[$$]$_: Wrote ${MESSAGEBODYDIGEST} to ${HOME}/.digestcache ${NL}" }

 :0 E
 { LOG="[$$]$_: Error updating ${HOME}/.digestcache with ${MESSAGEBODYDIGEST}${NL}" }

# Test the DIGESTDUPLICATE variable we just created to see if it contains a "yes".
# If so, log the event and file the duplicate email in /dev/null
 LOG="[$$]$_: Digest Cache - Duplicate Message detected. Filed in /dev/null${NL}"


# Sample procmail recipe that will do a lookup on the
# SpamCop block list, and tag the email if the IPv4 address
# is listed.

* IPV4VALID ?? ^yes$
* ! SOURCEIPREV ?? ^000\.000\.000\.000$
{ SPAMCOPBUFFER=`host ${SOURCEIPREV}.bl.spamcop.net` }

* IPV4VALID ?? ^yes$
* SPAMCOPBUFFER ?? 127\.0\.0\.2$
 LOGSTRING="Found ${SOURCEIP} listed in SpamCop. See: http://spamcop.net/bl.shtml?${SOURCEIP}"
 LOG="[$$]$_: ${LOGSTRING}${NL}"
 :0 wf
 |${FORMAIL} -A "X-blocklists: ${LOGSTRING}"

# Here is another example of a more complex blocklist lookup technique
# which will lookup an IP on zen.spamhaus.org, decode the response, and
# tag the email.

# References:
# http://www.spamhaus.org/zen/index.lasso
# http://www.spamhaus.org/faq/answers.lasso?section=DNSBL%20Technical#200

SPAMHAUSLOOKUP=`host ${SOURCEIPREV}.zen.spamhaus.org`

* SPAMHAUSLOOKUP ?? 127\.0\.0\.([2-9]|1[01])$
 # SBL Spamhaus Maintained
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.2$

 # --- reserved for future use
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.3$

 # XBL CBL Detected Address
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.4$

 # XBL NJABL Proxies (customized)
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.5$

 # XBL reserved for future use
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.6$

 # XBL reserved for future use
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.7$

 # XBL reserved for future use
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.8$

 # --- reserved for future use
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.9$

 # PBL ISP Maintained
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.10$

 # PBL Spamhaus Maintained
 * SPAMHAUSLOOKUP ?? 127\.0\.0\.11$
 { SPAMHAUSLOG="${SPAMHAUSLOG}PBL-SpamHaus Maintained, " }

 SPAMHAUSLOG=`echo "${SPAMHAUSLOG}" |sed -e "s/, $/\n\tSee: http:\/\/www\.spamhaus\.org\/query\/bl\?ip=${SOURCEIP}/"`
 LOG="[$$]$_: Result codes: ${SPAMHAUSLOG}${NL}"

 :0 f
 |${FORMAIL} -A "X-blocklists: ${SOURCEIP} found in SpamHaus. Blocklist lookup results: ${SPAMHAUSLOG}"

# Check the SPAMCOPLISTED variable, if yes, file the email

 LOGSTRING="Blacklisted by SpamCop. Email delivered to ${SPAMFOLDER}"
 LOG="[$$]$_: ${LOGSTRING}${NL}"

 :0: ${MAILDIR}/spamfolder.lock
 |${FORMAIL} -A "X-folder: Spam. ${LOGSTRING}" >>${SPAMFOLDER}

# Check the SPAMHAUSLISTED variable, if yes, file the email

 LOGSTRING="Blacklisted by SpamHaus. Email delivered to ${SPAMFOLDER}"
 LOG="[$$]$_: ${LOGSTRING}${NL}"

 :0: ${MAILDIR}/spamfolder.lock
 |${FORMAIL} -A "X-folder: Spam. ${LOGSTRING}" >>${SPAMFOLDER}

# File any mails that pass our tests in the default inbox.

LOG="[$$]$_: Email delivered to ${DEFAULT}${NL}"

|${FORMAIL} -A "X-Folder: Default" >>${DEFAULT}

---CUT HERE---

Other procmail references (Unix®/Linux man pages):

   man procmail
man procmailrc
man procmailsc
man procmailex
- Manual page describing procmail in general
- Manual page describing the procmail run control file
- Manual page describing procmail scoring
- Manual page showing various very basic procmail examples

www.trisk.com / Scamper's Homepage / Procmail tips and pointers

© 2006, 2007 by Garen L. Erdoisa - All Rights Reserved
Page last updated: Monday, Dec 24, 2007 Use this information at your own risk.
Contact Info:"Garen Erdoisa" <scamper@trisk.com>
URL: http://www.trisk.com/scamper/procmailtips.html