Spam Filtering: FogBugz Autosort Training



If you are wondering what does the message "Email not sorted automatically - training in progress" mean or want to know how FogBugz learns to filter your spam, this article explains the process behind training Autosort to filter your emails.




Nitty-Gritty: How Training Works

FogBugz Autosort works with Inquiries created from incoming mail only. Autosort will ignore cases manually created in the Inbox Project or incoming email that you have categorized differently. For this document, we will call these cases “emails”. Autosort will come out of ‘training mode’, and automatically sort any incoming messages, once more than 15 emails have been moved out of the Undecided area into another non-spam area. The emails must be sorted within the same project.

This is easily explained with an example. Imagine you have just set up a mailbox and Inbox project with the following areas:


Since you have not received any email yet, Autosort is in ‘training mode’ and places every incoming email into Undecided. If you receive 20 emails:


Now, you manually move these emails into their appropriate areas:


The total number of emails in categories other than Spam and Undecided is at least 15, so Autosort will start to categorize the next incoming email automatically.

However, if you were to change your mind and move one of these cases around such that:


…then Autosort would return to ‘training mode’ and place all incoming emails into Undecided until you moved another one into Not Spam, Monkeys, Zebras, or any other category except Spam and Undecided.

This means that even if you have received over 500 emails and diligently moved them as such:


…Autosort will still be in ‘training mode’ because it is below the training threshold.

This feature is implemented to guarantee that Autosort has sufficient knowledge of your email (not including spam) so that there is minimal chance of a non-spam message being automatically categorized as spam. When we say “your email” we mean, “your”, and that is the whole point of Bayesian spam filtering. Here at FogBugz, the word “mortgage” has surely somewhere around a 99.9% likelihood of residing in an email that is spam. However, if you are using FogBugz in the IT department of a mortgage lender, the word “mortgage” for all intents and purposes reveals nothing about whether or not the email that contains it is spam.


Read the following article for more information: