What Is SmartDOMDocument?

  • SmartDOMDocument is an enhanced version of PHP's built-in DOMDocument class.
  • SmartDOMDocument inherits from DOMDocument, so it's very easy to use – just declare an object of type SmartDOMDocument instead of DOMDocument and enjoy the new behavior on top of all existing functionality (see example below).

 

What Is DOMDocument?

  • DOMDocument is a native PHP library for using DOM to read, parse, manipulate, and write HTML and XML.
  • Instead of using hacky regexes that are prone to breaking as soon as something you haven't thought of changes, DOMDocument parses HTML/XML using the DOM (Document Object Model), just like your browser, and creates an easily manipulatable object in memory.
  • DOMDocument can actually validate and normalize your HTML/XML.
  • DOMDocument supports namespaces.

 

So What Exactly Does SmartDOMDocument Do Then?

DOMDocument by itself is good but has a few annoyances, which SmartDOMDocument tries to correct. Here are some things it does:

 

saveHTMLExact()

DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).

Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).

SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.

 

Encoding Fix

DOMDocument notoriously doesn't handle encoding (at least UTF-8) correctly and garbles the output.

SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you – just use loadHTML() as you would normally.

 

SmartDOMDocument Object As String

You can use a SmartDOMDocument object as a string which will print out its contents.

For example:

echo &quot;Here is the HTML: $smart_dom_doc&quot;;

 

Example

This example loads sample HTML using SmartDOMDocument, uses getElementsByTagName() to find and removeChild() to remove the first <img> tag, then prints the old HTML and the newly removed image HTML.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    $content = &lt;&lt;&lt;CONTENT
&lt;div class='class1'&gt;
  &lt;img src='http://www.google.com/favicon.ico' /&gt;
  Some Text
  &lt;p&gt;???????&lt;/p&gt;
&lt;/div&gt;
CONTENT;
 
    print &quot;Before removing the image, the content is: &quot; . htmlspecialchars($content) . &quot;&lt;br&gt;&quot;;
 
    $content_doc = new SmartDOMDocument();
    $content_doc-&gt;loadHTML($content);
 
    try {
      $first_image = $content_doc-&gt;getElementsByTagName(&quot;img&quot;)-&gt;item(0);
 
      if ($first_image) {
        $first_image-&gt;parentNode-&gt;removeChild($first_image);
 
        $content = $content_doc-&gt;saveHTMLExact();
 
        $image_doc = new SmartDOMDocument();
        $image_doc-&gt;appendChild($image_doc-&gt;importNode($first_image, true));
        $image = $image_doc-&gt;saveHTMLExact();
      }
    } catch(Exception $e) { }
 
    print &quot;After removing the image, the content is: &quot; . htmlspecialchars($content) . &quot;&lt;br&gt;&quot;;
    print &quot;The image is: &quot; . htmlspecialchars($image);
  }

 

Requirements And Prerequisites

  • PHP 5.2+. This is no longer a requirement – any version of PHP 5 that has DOMDocument should work now.
  • DOMDocument – this should be a built-in class but I've seen instances of it missing for some reason. My guess is 99.9% you will already have it.

 

Sounds Great – Where Do I Get It?

Download

http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk/SmartDOMDocument.class.php

 

Check out from SVN

svn co <a href="http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk">http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk</a> SmartDOMDocument

I highly recommend using SVN (Subversion) because you can easily update to the latest version by running svn up.

 

Use as "svn:externals"

If you have an existing project in SVN and you would like to use SmartDOMDocument, you can use set up this library as svn:externals.

svn:externals is kind of like a symlink to another repository from your existing SVN project. That way, you can still benefit from using SVN commands such as svn up without having to maintain a local copy of the external code.

You can read more about setting svn:externals here.

Here's how you would do this:

1
2
3
4
cd YOUR_PROJ_DIR;
svn propset svn:externals 'SmartDOMDocument http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk' .
svn ci .
svn up

Git

Update 9/11/2015: I have moved the code from svn to git (Bitbucket). You can now find it here and if you have contributions, feel free to send a pull request.

 

Version History

0.4.1

  • added return value to loadHTML() (thanks, Grey)

0.4

  • No longer using C14N() because it is causing bugs (such as <br> turning into <br></br>)
  • PHP 5.2+ is no longer a requirement, due to the change above.

0.3.2

  • test/example function added

0.3.1

  • suppress warnings when loading HTML by default (this may change to use a setting later). This gets rid of "empty content", "unexpected tag", and other not well formed HTML warnings.
  • add the standard trunk/tags/branches layout to the SVN repository.

0.3

  • use a better, more portable method of dealing with encodings properly (thanks piopier).

0.2

  • use the undocumented DOMDocument->C14N() if it's available (PHP 5.2+) to save exact HTML (really, PHP? We don't document extremely useful functions anymore?).

0.1

  • initial release.

 

References

 

How To Report Bugs

You have a few options here: