Built on Go, ChatEngine has 65,000 possible responses and counting.
In reality, ChatEngine is very simple. It takes input and matches it against a regular expression that determines if the input is a greeting. If so, a greeting is returned and the process ends. If the message is not a greeting, it is processed to make the job easier for the matcher by regulating formats.
The steps include:
- Punctuation fixes
- Spelling fixes
- Profanity filter
- Is it CJK (Chinese, Japanese, Korean)?
- Whitespace simplifier
After sufficient processing, the string is handed over to the matcher. The matcher iterates through all the messages and compares the input message with all the other messages using the Levenshtein distance algorithm. Levenshtein distance is relatively fast but not highly representative of text matches, so the abundance of messages makes up for this. In the future, usage of full NLP is planned, as well as a smarter matching algorithm.
As the amount of mesages grows, it will also be impractical to iterate through every message. Planned solutions are parallelism, binary search, skipping, and Markov chains.
Go was the ideal choice of programming language for ChatEngine.
It meets all the requirements:
- Easy to debug
I like Go because it offers great speed similar to Rust and C++ while preseving dynamism. You can pass functions around as values just like in Python. It’s seamless. There are also plenty great libraries for it.