DOMDocument::loadHTMLFile

(PHP 5, PHP 7, PHP 8)

DOMDocument::loadHTMLFile Carga HTML desde un fichero

Descripción

public DOMDocument::loadHTMLFile(string $filename, int $options = 0): bool

Esta función analiza el documento HTML del fichero llamado filename. A diferencia de cargar XML, HTML no tiene que estar bien formado para cargarse.

Advertencia

Use Dom\HTMLDocument to parse and process modern HTML instead of DOMDocument.

This function parses the input using an HTML 4 parser. The parsing rules of HTML 5, which is what modern web browsers use, are different. Depending on the input this might result in a different DOM structure. Therefore this function cannot be safely used for sanitizing HTML.

The behavior when parsing HTML can depend on the version of libxml that is being used, particularly with regards to edge conditions and error handling. For parsing that conforms to the HTML5 specification, use Dom\HTMLDocument::createFromString() or Dom\HTMLDocument::createFromFile(), added in PHP 8.4.

As an example, some HTML elements will implicitly close a parent element when encountered. The rules for automatically closing parent elements differ between HTML 4 and HTML 5 and thus the resulting DOM structure that DOMDocument sees might be different from the DOM structure a web browser sees, possibly allowing an attacker to break the resulting HTML.

Parámetros

filename

La ruta al fichero HTML.

options

Bitwise OR of the libxml option constants.

Valores devueltos

Devuelve true en caso de éxito o false en caso de error.

Errores/Excepciones

Si se pasa una cadena vacía a filename o se nombra un fichero vacío, se generará una advertencia. Esta advertencia no es generada por libxml y no puede ser controlada utilizando las funciones de manejo de errores de libxml.

Aunque el HTML malformado debería cargar con éxito, esta función puede generar errores E_WARNING al encontrarse con marcado erróneo. Se pueden usar las funciones de manejo de errores de libxml para manejar estos errores.

Historial de cambios

Versión Descripción
8.3.0 Esta función ahora tiene un tipo de retorno bool tentativo.
8.0.0 Llamar a esta función de forma estática ahora lanzará un Error. Anteriormente, se emitía un E_DEPRECATED.

Ejemplos

Ejemplo #1 Creando un Documento

<?php
$doc
= new DOMDocument();
$doc->loadHTMLFile("filename.html");
echo
$doc->saveHTML();
?>

Ver también

add a note

User Contributed Notes 4 notes

up
14
onemanbanddan at gmail dot com
11 years ago
The options for surpressing errors and warnings will not work with this as they do for loadXML()
e.g.
<?php
$doc
->loadHTMLFile($file, LIBXML_NOWARNING | LIBXML_NOERROR);
?>
will not work.
you must use:
<?php
libxml_use_internal_errors
(true);
$doc->loadHTMLFile($file);
?>
and handle the exceptions as neccesarry.
up
5
Mark Omohundro, ajamyajax dot com
16 years ago
<?php
// try this html listing example for all nodes / includes a few getElementsByTagName options:

$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

// example 1:
$elements = $doc->getElementsByTagName('*');
// example 2:
$elements = $doc->getElementsByTagName('html');
// example 3:
//$elements = $doc->getElementsByTagName('body');
// example 4:
//$elements = $doc->getElementsByTagName('table');
// example 5:
//$elements = $doc->getElementsByTagName('div');

if (!is_null($elements)) {
foreach (
$elements as $element) {
echo
"<br/>". $element->nodeName. ": ";

$nodes = $element->childNodes;
foreach (
$nodes as $node) {
echo
$node->nodeValue. "\n";
}
}
}
?>
up
-4
andy at carobert dot com
19 years ago
This puts the HTML into a DOM object which can be parsed by individual tags, attributes, etc.. Here is an example of getting all the 'href' attributes and corresponding node values out of the 'a' tag. Very cool....

<?php
$myhtml
= <<<EOF
<html>
<head>
<title>My Page</title>
</head>
<body>
<p><a href="/mypage1">Hello World!</a></p>
<p><a href="/mypage2">Another Hello World!</a></p>
</body>
</html>
EOF;

$doc = new DOMDocument();
$doc->loadHTML($myhtml);

$tags = $doc->getElementsByTagName('a');

foreach (
$tags as $tag) {
echo
$tag->getAttribute('href').' | '.$tag->nodeValue."\n";
}
?>

This should output:

/mypage1 | Hello World!
/mypage2 | Another Hello World!
up
-5
qrworld.net
10 years ago
In this post http://softontherocks.blogspot.com/2014/11/descargar-el-contenido-de-una-url_11.html I found a simple way to get the content of a URL with DOMDocument, loadHTMLFile and saveHTML().

function getURLContent($url){
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
@$doc->loadHTMLFile($url);
return $doc->saveHTML();
}
To Top