Punctuation can be a beautiful thing, and a wonderful tool in the battle to achieve clarity in communications. But now it appears it can be a tool for evil, as well.
Security firm Symantec reports that spammers are now using a particular type of hyphen to make it easier to get URLs into messages and on web pages without being caught by filters designed to block known links to dubious sites.
The hyphen in question is the soft hyphen. From a visual standpoint it looks exactly the same as a standard hyphen–but to a computer it is a signal that it marks as an acceptable, and perhaps even as a preferred, place to split a word or phrase over two lines if needed.
That can be particularly useful in word processing and desktop publishing, as it avoids the software simply breaking up words to fit and, for example, replacing “therapist” with “the rapist”. (I wish I could say that was a hypothetical example…)
When it comes to HTML, there’s a dedicated code (­) for the soft hyphen, but many browsers are set to hide the character unless it is actually used to break a word over two lines.
The spammers take advantage of this by inserting a soft hyphen in the middle of a URL. The viewer doesn’t see any difference in the address, meaning it looks legitimate (in the sense of being a real website address.) The browser simply ignores it, meaning a click on the link takes the user to the correct address.
But a filter that relies on scanning for bogus links won’t necessarily be able to realize that knownbadsite.com is the same as knownbadsite.com, and thus won’t block the link. (That’s a slightly simplified explanation of the procedure, but the principle is the same.)
The good news is that more advanced URL filters can work around the problem. It’s also likely that HTML 5 will limit the issue by making sure all browsers interpret HTML code in the same way.