chat-filter
Basic chat filter preventing bypasses
How it works
Once a user inputs a message, the message in its entirety is checked for any profanity or links - a word only has to be 80% similar or above to a banned word for it to be blocked - this is pretty funky currently, so I have plans to implement a whitelisted words list too.
To check string similarity, a combination of 4 different mathematical strategies are used.
These algorithms are implemented via a string similarity library found here.
Current efficiency
A full clean message of 100 characters takes approximately 120ms to check - this is the absolute longest case scenario.
This filter algorithm has a big-o notation of O(n), a linear increase.
As much as efficiency is important, I've been mainly focusing on accuracy, and making it as hard as possible to bypass. This is accomplished by removing any whitespace (such as spaces, underscores, hyphens, etc.) and checking different combinations of the string, replacing common number->letter bypasses too.
At the current time, this filter can detect a message such as:
"This guy is a F_4-6 90_T"
as the user attempting to say the word "faggot" with 100% confidence.
There is an option to turn checking for string similarity off, which will detect all bypasses, but it will not be able to block slightly misspelled words. However, the benefit to only relying on the regex is that a full message only takes approximately 1ms to check.
Updates to be done
- Externally storing options, most likely in a mongo database
- Further optimisations to cut the time taken to execute down as much as possible
Terms
This source code is only public for the purposes of other developers to review, and to showcase my work. If you wish to use any of the source code written by me, you must have my clear permission to do so. Once you have permission, however, you do not need to credit, reference or otherwise source my name in your project (but it would be appreciated if you did