Filtering

Filters
Scribe has 2 filtering systems at the moment, one a user specified filter list and the other is a bayesian spam filter. They mostly work separately, (if enabled) the bayesian filter runs first and checks incomming mail for spam. If it passes that stage then it's run through the normal user specified filters. Which may sort, delete and mark or label email.

To create a new user filter, use the Filters -> New Filter menu. This opens a filter edit window to allow you to set the conditions and actions of the filter. The first tab of the filter window has a name, to describe what the filter does, and a couple of buttons to change the order that the filters run in. If you select the filters folder in the main window and set the sorting to "Index - Descending" then the buttons allow you to move the filter up and down. Filters are executed in ascending order, from 1 through to the last number unless some filter asks for furthur processing to stop. This means that multiple filters may match on the same email and run their actions on it.

i.Scribe: Only the first filter runs when you receive mail. This is just to demonstrate the feature that is in InScribe, enough so that you can see how it works.

The user filtering system is based around the Scribe DOM which is a system for specifing fields within Scribe objects via text labels. It's quite simple once you get the hang of it and the menus in the filter's user interface help you with some shortcuts to get you started. For example you might want to write a filter that checks against the value of an incomming emails "from" header, well you'd use a DOM field like this:

mail.From
Where "mail" is an object (of type Object::Mail), i.e. the incomming email, and "From" the field within the object. However, if you study the DOM, you'll see that the From field of a mail object is of type Object::Address. Which itself has separate sub fields. So you could then specify a more precise DOM field to acheive something, like querying just the name of the person sending the email like this:
mail.From.Name

This way of doing things is surprisingly powerful if you want to write some complicated filters. Ok, so heres a few examples of what can be done with DOM fields; firstly, you can select a single header out of the incomming mail's headers using:

mail.InternetHeader[<header-name>]
Which is useful if some upstream mail processor has added a header that you want to filter on. Then there is the From field of the mail, which has a Contact sub field that links to the local Contact database if the address's email matches a local Contact. If someone in your Contacts emails you then you can access all their Contact record data from the filtering system like this:
mail.From.Contact.Folder
Would return the path of the folder that the contact is stored in. So you could, if you wanted to put your contacts in different sub folders and then filter incomming mail based on which folder the contact was in. I'm sure you can think of applications for that ;)

The contact record even has a bunch of custom fields that you can name yourself and assign whatever value to. This can then be used to filter on as well. This could be used to track customer numbers or group contacts in way specific to your own needs.

There are several things that you can do with filters beyond just filtering the incomming mail. Different things may be accomplished by using different sets of filters. In this case it's useful to know that only the filters in /Filters are used in the filtering process, and filters in sub-directories are not. This means that you can "switch off" filters by putting them into a sub-directory of /Filters.

If you have a folder of mail that you want to process using filters, you can do so by selecting the folder in the tree view and then clicking the Filters->Filter the Current Folder menu. This runs the filters over all the email in the current folder as if you had just received it. Usually know as "filtering after the fact" in some other mail apps. This is handy if something you just received ended up in your inbox instead of being filtered, then you can go and adjust the filter, and re-run the filters over the inbox.

On a related note, there is another feature that may prove useful when filtering folders. If you right click on a folder, there is an option to "Collect all mail from Sub-Folders". This moves all email in the current folder's sub-folders, into the currently selected folder.

Before the actions of a filter can run the conditions of the filter must be met. The filter has a list of conditions, that are either OR'd together or AND'd together to return TRUE or FALSE. The option to use AND or OR is at the bottom of the conditions tab in the filter window.

The conditions list is setup as a set of records, where you create and delete conditions with the buttons "New" and "Delete". You can seek along the set of records with the scrollbar.

To configure a condition, choose the field. This is any valid DOM field, and isn't limited to the list in the drop down box.

Then choose the operator, most a self explainitory, but a few bear talking about. "Like" does a wildcard match, where the wildcard '*' matches any characters and '?' matches any single character. Contains does a sub string search for the value.

You can invert the logic of the condition by using the NOT operator.

The drop down for the value field is entirely optional, it's just to help select value's of certain types. Most of the time you can just enter in values directly.

Once all the conditions are met then the actions are executed, in order for first to last. If you would like this filter to be the last filter processed on this email, set the "Stop furthur processing of filters" option.

The available actions are:

  • Move to folder
    Moves the email to the specified folder. Select the destination folder with "...".
  • Delete
    Deletes the email. Use the "..." button to configure where to delete from, either local folders, the server or both.
  • Print
    Prints the email using the current Printer setup (File -> Print Setup).
  • Play Sound
    Plays the specified sound. Select a sound with "...".
  • Open Email
    Opens the email in it's own window.
  • Execute Process
    Executes a program, optionally with parts of the email as arguments. The program executed is given by the argument field. The available arguments are:
    • %exepath%
      The full path of the current Scribe executable.
    • %field(<name>)%
      A field from the body of the message. If you put mail like feilds in the body of the message this can extract them out an use them as arguments to the process.
    • %attachment%
      The first attachment as a file.
    • %body%
      The whole body as a file.
  • Set Colour
    Sets the mark colour of the message.
  • Set Read
    Set the message read. This will mean it doesn't show up in the new mail list.
  • Set Label
    Sets the label of the message.
  • Empty Folder
    Empties the folder given in the argument.
  • Mark As Spam
    Deletes the email as spam. Which is the same as hitting the spam button on the main window's toolbar.
  • Reply
    Replies to the email with a given template. To setup a template, create a new mail in the /Templates folder by right clicking on the folder in the tree view and selecting "New Mail". Then fill out the body of the message you want to use as a reply and save the template. Then in the filter action click the "..." to set the template to use.

    You can use DOM fields in the template to reference the source email, e.g.:

    Dear <? mail.from.name ?>,
    I received your email on <? mail.datereceived ?>, thanks.
    
  • Forward
    Forwards the email onto a new recipient. Click the "..." button to set optional parameters. Otherwise just enter the email address to forward to. The forward action can also use a template and DOM fields like the reply action described above.
  • Bounce
    Bounces the email to the address specified in the argument box as if the message had orginally come from the first sender instead of yourself. Which means that any reply your recipient sends will go to the original sender, not you.
  • Save Attachment(s)
    Saves some or all of the attachments to a folder outside the mail storage. Click the "..." button to specify the folder to save to and the types of files to save. Use "*" to match all files, or use something like "*.gif *.jpg *.png" to match a subset of files.
  • Delete Attachment(s)
    This action deletes attachments from an email. The argument specifies the types of files to delete. This can be "*" to match all files, or something like "*.gif *.jpg *.png" to match a subset of files.

The script tab overrides both the conditions and actions tab, in that if you enter a script there then Scribe assumes you want to check the conditions in script instead of use the limited filter conditions.

The documentation on scripting is here.

Firstly I'll refer to anything that isn't spam as "ham".

Basically the first stage is to collect the spam as it arrives and "tag" it for what it is by using the "Delete As Spam" button in the toolbar. You should create a subdirectory off "Mailbox" called "Spam" if it doesn't already exist.

Once you have a little bit of spam collected, switch the filtering into "Training mode" using the Filters->Bayesian Filtering Options dialog. I set the probably directory to "/Spam/Probably" so I can check it easily for false positives.

Then run the Filters->Build Word Lists command which will iterate through all your mail and build a database of words, both good and bad. As a side effect of this a whitelist is generated from the "from" address of all the email not in the Spam folder.

Now as you being to receive mail the filter will start classifing it into Spam and Ham. The Spam is put in the "probably" folder (whatever you configured that to). The new mail functions are not triggered when you receive Spam. Which is nice, cause it won't distract you from what your doing.

Every now and then go through your "probably" folder and "delete as spam" the contents (minus any false positives of course).

Also at this point the word database is not updated automatically, you have to re-run the Filters->Build Word Database every few days to keep it up to date. The problem with keeping the word database up to date is not adding the Spam mail to the spam word database but deciding when to add the Ham to the ham database. For instance if you receive a mail that is a false negitive (i.e. a Spam classified as Ham) then you would read the email and then "Delete As Spam". If it's a ham you leave it or move it into another folder. So sometime after the mail is read and hasn't been "Deleted as Spam" it needs to be added to the Ham word list. But there is no explict event that occurs to add this action to. I thought about using a timeout, i.e. if you havn't deleted it as spam within x minutes of reading the mail then it's probably Ham right? But I can think of reasons when that will fail. But what I don't want is having to classify every ham as "Ham". Thats just adding too much work to the daily routine. Another option would be to automatically add every mail to the ham db and then if the user clicks "Delete As Spam" on it the ham db word counts are decremented and the spam word counts are incremented. But thats double handling. Which is inefficent. So I am sticking with the manual "build word lists" for the moment until I can resolve this issue.

Once your incomming mail is being sorted correctly, i.e. your getting no false positives and the false negitives are quite low. Put the filter into Live mode, and remove the probably folder. The filter will "delete as spam" anything that gets a spammy score of 0.9 or above. From experience a few hundred spam is all thats necessary to make the filter work.

The spam sitting in the spam folder needs to stay there in the current implementation. You can't delete them because when the word database is rebuilt it will scan that folder, and if the words aren't there then the spam word file will be empty, and thus the filter will stop working.

Generally you should expect effiency in the order of about 98% or a little better with a well populated folder and no viruses. Viruses tend to skew the results towards the "spam" side of things. I find it often easier to filter out viruses by using user filters as they don't have enough text in the message to effectively be filtered by the bayesian filter.


© 1996-2004 Matthew Allen