From my past experience coding and the way the attack emotes look, I'm guessing it is quite a bit more complex than just a series of messages, but that they are instead assembled from fragments. The way I imagine they work for martial arts are as follows:
First, determine by posture if an attack is even made and if not, give an appropriate posture message from a small collections of possibly messages.
Second, determine the type of attack that will be performed.
Third, if it is a targeted attack and not just a full body attack like the throw, determine hit location with a different hit location table for each attack type.
Fourth, determine the effectiveness of the attack.
Finally, display a message that is made of the combinations above to give attacker, action or attack type, target if there is an attack, hit location if the attack is targeted, effectiveness of the attack and finally any random closing action or flavor tacked onto the end of the emote (like stepping back to ready for the next attack). Then it also has to do the math and random number checks for each stage and apply the results to the fatigue system and damage system which in turn, through those systems can then apply their own adjustments and messages, like fatigue or damage affecting cleanliness, descriptions message changes, etc. It is very unlikely that it is just a long list of possible messages that it picks from. And I'll note that this is all assuming with this explanation that the attacks are sucessfull, because failed attacks would generate their own whole message assembly tree, like whether or not it would have hit, whether or not it was dodged, whether or not it was blocked or deflected, or stopped by armor, etc.
Putting in whole new combat systems or weapon types would be pretty involved processes.