- What Is SmartDOMDocument?
 - What Is DOMDocument?
 - So What Exactly Does SmartDOMDocument Do Then?
 - saveHTMLExact()
 - Encoding Fix
 - SmartDOMDocument Object As String
 - Example
 - Requirements And Prerequisites
 - Sounds Great – Where Do I Get It?
 - Download
 - Check out from SVN
 - Use as "svn:externals"
 - Git
 - Version History
 - References
 - How To Report Bugs
 - Comments (37)
 
What Is SmartDOMDocument?
- SmartDOMDocument is an enhanced version of PHP's built-in DOMDocument class.
 - SmartDOMDocument inherits from DOMDocument, so it's very easy to use – just declare an object of type SmartDOMDocument instead of DOMDocument and enjoy the new behavior on top of all existing functionality (see example below).
 
What Is DOMDocument?
- DOMDocument is a native PHP library for using DOM to read, parse, manipulate, and write HTML and XML.
 - Instead of using hacky regexes that are prone to breaking as soon as something you haven't thought of changes, DOMDocument parses HTML/XML using the DOM (Document Object Model), just like your browser, and creates an easily manipulatable object in memory.
 - DOMDocument can actually validate and normalize your HTML/XML.
 - DOMDocument supports namespaces.
 
So What Exactly Does SmartDOMDocument Do Then?
DOMDocument by itself is good but has a few annoyances, which SmartDOMDocument tries to correct. Here are some things it does:
saveHTMLExact()
DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.
Encoding Fix
DOMDocument notoriously doesn't handle encoding (at least UTF-8) correctly and garbles the output.
SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you – just use loadHTML() as you would normally.
SmartDOMDocument Object As String
You can use a SmartDOMDocument object as a string which will print out its contents.
For example:
echo "Here is the HTML: $smart_dom_doc";  | 
Example
This example loads sample HTML using SmartDOMDocument, uses getElementsByTagName() to find and removeChild() to remove the first <img> tag, then prints the old HTML and the newly removed image HTML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  |     $content = <<<CONTENT
<div class='class1'>
  <img src='http://www.google.com/favicon.ico' />
  Some Text
  <p>???????</p>
</div>
CONTENT;
 
    print "Before removing the image, the content is: " . htmlspecialchars($content) . "<br>";
 
    $content_doc = new SmartDOMDocument();
    $content_doc->loadHTML($content);
 
    try {
      $first_image = $content_doc->getElementsByTagName("img")->item(0);
 
      if ($first_image) {
        $first_image->parentNode->removeChild($first_image);
 
        $content = $content_doc->saveHTMLExact();
 
        $image_doc = new SmartDOMDocument();
        $image_doc->appendChild($image_doc->importNode($first_image, true));
        $image = $image_doc->saveHTMLExact();
      }
    } catch(Exception $e) { }
 
    print "After removing the image, the content is: " . htmlspecialchars($content) . "<br>";
    print "The image is: " . htmlspecialchars($image);
  } | 
Requirements And Prerequisites
PHP 5.2+.This is no longer a requirement – any version of PHP 5 that has DOMDocument should work now.- DOMDocument – this should be a built-in class but I've seen instances of it missing for some reason. My guess is 99.9% you will already have it.
 
Sounds Great – Where Do I Get It?
Download
http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk/SmartDOMDocument.class.php
Check out from SVN
svn co <a href="http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk">http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk</a> SmartDOMDocument  | 
I highly recommend using SVN (Subversion) because you can easily update to the latest version by running svn up.
Use as "svn:externals"
If you have an existing project in SVN and you would like to use SmartDOMDocument, you can use set up this library as svn:externals.
svn:externals is kind of like a symlink to another repository from your existing SVN project. That way, you can still benefit from using SVN commands such as svn up without having to maintain a local copy of the external code.
You can read more about setting svn:externals here.
Here's how you would do this:
1 2 3 4  | cd YOUR_PROJ_DIR; svn propset svn:externals 'SmartDOMDocument http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk' . svn ci . svn up  | 
Git
Update 9/11/2015: I have moved the code from svn to git (Bitbucket). You can now find it here and if you have contributions, feel free to send a pull request.
Version History
0.4.1
- added return value to loadHTML() (thanks, Grey)
 
0.4
- No longer using C14N() because it is causing bugs (such as <br> turning into <br></br>)
 - PHP 5.2+ is no longer a requirement, due to the change above.
 
0.3.2
- test/example function added
 
0.3.1
- suppress warnings when loading HTML by default (this may change to use a setting later). This gets rid of "empty content", "unexpected tag", and other not well formed HTML warnings.
 - add the standard trunk/tags/branches layout to the SVN repository.
 
0.3
- use a better, more portable method of dealing with encodings properly (thanks piopier).
 
0.2
- use the undocumented DOMDocument->C14N() if it's available (PHP 5.2+) to save exact HTML (really, PHP? We don't document extremely useful functions anymore?).
 
0.1
- initial release.
 
References
How To Report Bugs
You have a few options here:
- Leave a comment here.
 - Send a pull request.
 
