- What Is SmartDOMDocument?
- What Is DOMDocument?
- So What Exactly Does SmartDOMDocument Do Then?
- saveHTMLExact()
- Encoding Fix
- SmartDOMDocument Object As String
- Example
- Requirements And Prerequisites
- Sounds Great – Where Do I Get It?
- Download
- Check out from SVN
- Use as "svn:externals"
- Git
- Version History
- References
- How To Report Bugs
- Comments (37)
What Is SmartDOMDocument?
- SmartDOMDocument is an enhanced version of PHP's built-in DOMDocument class.
- SmartDOMDocument inherits from DOMDocument, so it's very easy to use – just declare an object of type SmartDOMDocument instead of DOMDocument and enjoy the new behavior on top of all existing functionality (see example below).
What Is DOMDocument?
- DOMDocument is a native PHP library for using DOM to read, parse, manipulate, and write HTML and XML.
- Instead of using hacky regexes that are prone to breaking as soon as something you haven't thought of changes, DOMDocument parses HTML/XML using the DOM (Document Object Model), just like your browser, and creates an easily manipulatable object in memory.
- DOMDocument can actually validate and normalize your HTML/XML.
- DOMDocument supports namespaces.
So What Exactly Does SmartDOMDocument Do Then?
DOMDocument by itself is good but has a few annoyances, which SmartDOMDocument tries to correct. Here are some things it does:
saveHTMLExact()
DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain <html> and <body> tags, it adds them automatically (yup, there are no flags to turn this behavior off).
Thus, when you call $doc->saveHTML(), your newly saved content now has <html><body> and DOCTYPE in it. Not very handy when trying to work with code fragments (XML has a similar problem).
SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.
Encoding Fix
DOMDocument notoriously doesn't handle encoding (at least UTF-8) correctly and garbles the output.
SmartDOMDocument tries to work around this problem by enhancing loadHTML() to deal with encoding correctly. This behavior is transparent to you – just use loadHTML() as you would normally.
SmartDOMDocument Object As String
You can use a SmartDOMDocument object as a string which will print out its contents.
For example:
echo "Here is the HTML: $smart_dom_doc"; |
Example
This example loads sample HTML using SmartDOMDocument, uses getElementsByTagName() to find and removeChild() to remove the first <img> tag, then prints the old HTML and the newly removed image HTML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | $content = <<<CONTENT <div class='class1'> <img src='http://www.google.com/favicon.ico' /> Some Text <p>???????</p> </div> CONTENT; print "Before removing the image, the content is: " . htmlspecialchars($content) . "<br>"; $content_doc = new SmartDOMDocument(); $content_doc->loadHTML($content); try { $first_image = $content_doc->getElementsByTagName("img")->item(0); if ($first_image) { $first_image->parentNode->removeChild($first_image); $content = $content_doc->saveHTMLExact(); $image_doc = new SmartDOMDocument(); $image_doc->appendChild($image_doc->importNode($first_image, true)); $image = $image_doc->saveHTMLExact(); } } catch(Exception $e) { } print "After removing the image, the content is: " . htmlspecialchars($content) . "<br>"; print "The image is: " . htmlspecialchars($image); } |
Requirements And Prerequisites
PHP 5.2+.This is no longer a requirement – any version of PHP 5 that has DOMDocument should work now.- DOMDocument – this should be a built-in class but I've seen instances of it missing for some reason. My guess is 99.9% you will already have it.
Sounds Great – Where Do I Get It?
Download
http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk/SmartDOMDocument.class.php
Check out from SVN
svn co <a href="http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk">http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk</a> SmartDOMDocument |
I highly recommend using SVN (Subversion) because you can easily update to the latest version by running svn up.
Use as "svn:externals"
If you have an existing project in SVN and you would like to use SmartDOMDocument, you can use set up this library as svn:externals.
svn:externals is kind of like a symlink to another repository from your existing SVN project. That way, you can still benefit from using SVN commands such as svn up without having to maintain a local copy of the external code.
You can read more about setting svn:externals here.
Here's how you would do this:
1 2 3 4 | cd YOUR_PROJ_DIR; svn propset svn:externals 'SmartDOMDocument http://svn.beerpla.net/repos/public/PHP/SmartDOMDocument/trunk' . svn ci . svn up |
Git
Update 9/11/2015: I have moved the code from svn to git (Bitbucket). You can now find it here and if you have contributions, feel free to send a pull request.
Version History
0.4.1
- added return value to loadHTML() (thanks, Grey)
0.4
- No longer using C14N() because it is causing bugs (such as <br> turning into <br></br>)
- PHP 5.2+ is no longer a requirement, due to the change above.
0.3.2
- test/example function added
0.3.1
- suppress warnings when loading HTML by default (this may change to use a setting later). This gets rid of "empty content", "unexpected tag", and other not well formed HTML warnings.
- add the standard trunk/tags/branches layout to the SVN repository.
0.3
- use a better, more portable method of dealing with encodings properly (thanks piopier).
0.2
- use the undocumented DOMDocument->C14N() if it's available (PHP 5.2+) to save exact HTML (really, PHP? We don't document extremely useful functions anymore?).
0.1
- initial release.
References
How To Report Bugs
You have a few options here:
- Leave a comment here.
- Send a pull request.