Here is a function I wrote to capitalize the previous remarks about charset problems (UTF-8...) when using loadHTML and then DOM functions.It adds the charset meta tag just after <head> to improve automatic encoding detection, converts any specific character to an html entity, thus PHP DOM functions/attributes will return correct values.<?phpmb_detect_order("ASCII,UTF-8,ISO-8859-1,windows-1252,iso-8859-15");function loadNprepare($url,$encod='') { $content = file_get_contents($url); if (!empty($content)) { if (empty($encod)) $encod = mb_detect_encoding($content); $headpos = mb_strpos($content,'<head>'); if (FALSE=== $headpos) $headpos= mb_strpos($content,'<HEAD>'); if (FALSE!== $headpos) { $headpos+=6; $content = mb_substr($content,0,$headpos) . '<meta http-equiv="Content-Type" content="text/html; charset='.$encod.'">' .mb_substr($content,$headpos); } $content=mb_convert_encoding($content, 'HTML-ENTITIES', $encod); } $dom = new DomDocument; $res = $dom->loadHTML($content); if (!$res) return FALSE; return $dom;}?>NB: it uses mb_strpos/mb_substr instead of mb_ereg_replace because that seemed more efficient with huge html pages.