Nitty-gritties: how training works
If you’re asking yourself “Why does this thing say ‘Email not sorted automatically – training in progress’ when I’ve been using it for weeks?” then read on! Or even if you’re not asking yourself this question, yet want to know how Manuscript learns to zap your spam, read on.
In a Nutshell:
Manuscript Autosort works with Inquiries created from incoming mail only. Cases manually created in the Inbox Project or incoming email that you’ve categorized differently will be ignored by Autosort. For the purposes of this document, we will call these cases “emails”. Autosort will come out of ‘training mode’, and automatically sort any incoming messages, once more than 15 emails have been moved out of the Undecided area into another non-spam area. The emails must be sorted within the same project. See this post for more details.
This is easily explained with an example. Imagine you have just set up a mailbox and Inbox project with the following areas:
Since you have not received any email yet, Autosort is in ‘training mode’ and places every incoming email into Undecided. If you receive 20 emails:
Now, you manually move these emails into their appropriate areas:
The total number of emails in categories other than Spam and Undecided is at least 15, so Autosort will start to automatically categorize the next incoming email.
However, if you were to change your mind and move one of these cases around such that:
…then Autosort would return to ‘training mode’ and place all incoming emails into Undecided until you moved another one into Not Spam, Monkeys, Zebras, or any other category except Spam and Undecided.
Note that this means that even if you have received over 500 emails and diligently moved them as such:
…Autosort will still be in ‘training mode’ because it is below the training threshold.
“Why all this nonsense about 15? Just have it start sorting immediately!” you say.
This feature is implemented in order to guarantee that Autosort has sufficient knowledge of your email (not including spam) so that there is minimal chance of a non-spam message being automatically categorized as spam. When we say “your email” we mean do mean “your”, and that’s the whole point of Bayesian spam filtering. Here at FogBugz, the word “mortgage” has surely somewhere around a 99.9% likelihood of residing in an email that is spam. However, if you’re using Manuscript in the IT department of a mortgage lender, the word “mortgage” for all intents and purposes reveals nothing about whether or not the email that contains it is spam.