Hex escape everything
July 29th, 2008
I like Unicode. And use parts of it you probably haven’t seen. Like trying to transliterate runes.
I’ve frequently run into cases between these diffrenet languages and document encodings that utf-8 will not do the whole job. So I’ve made the following set of php functions to convert a utf-8 sting into an (ASCII? ISO-8859-1?) hex-escaped string that I don’t have to worry about displaying.
function _utf8_to_html ($data) {
$ret = 0;
foreach((str_split(strrev(chr((ord($data{0}) % 252 % 248 % 240 % 224 % 192) + 128) . substr($data, 1)))) as $k => $v)
$ret += (ord($v) % 128) * pow(64, $k);
return "&#x" . base_convert($ret,10,16) . ";";
}
function cleanstring($string) {
return stripslashes(preg_replace("/([\\xC0-\\xF7]{1,1}[\\x80-\\xBF]+)/e”, ‘_utf8_to_html(”\\1″)’, $string));
}
Posted in PHP |