How To Fix Incomplete WordPress (WXR) Exports

Posted by Artem Russakovskii on April 13th, 2012 in PHP, Programming, Tips, Wordpress

Having spent way more time on this problem than I really should have, I'm going to make sure everyone can actually find a solution instead of useless WordPress support threads.

The Problem

I wanted to export all the data from WordPress using its native export mechanism (located at http://YOURBLOG/wp-admin/export.php), but since the blog I was working on was pretty large (6k posts, 120k comments), I kept getting XML files that ended prematurely and for which xmllint spit out this error:

Premature end of data in tag channel

Upon closer inspection, I saw the XML file ended with a random, yet always fully closed, </item> tag, but was missing the closing </channel> and </rss> tags, as well as a whole bunch of data.

My immediate theory was that PHP was running out of memory. It's also interesting that the number of <item> elements was always divisible by 20, but after looking at the code in export.php, I saw the loop grabbed 20 posts at a time. This revelation made it obvious that the code crashed while processing one of the batches, which only made the out-of-memory theory stronger.

The Solution

After raising the memory in the export.php file itself and verifying the fix, I came up with the following solution that removes the need to modify core WordPress files. Just add this somewhere in functions.php:

* Dynamically increase allowed memory limit for export. 
function my_export_wp() { 
  ini_set('memory_limit', '1024M'); 
add_action('export_wp', 'my_export_wp');

The code above will dynamically set the memory limit high enough for all but unimaginably large jobs to complete. Feel free to adjust this limit if you have, for instance, millions of comments and lots of RAM on the web server.

And there you have it – full exports. Much better, isn't it?

P.S. Some shared hosting providers have PHP set to ignore the directive above, in which case this solution, or no other solution but upgrading your hosting, will do anything.

P.P.S. There is a bug (#15203) in WordPress <=3.3.1, which doesn't properly escape posts that contain CDATA and breaks XML. It is slated for a fix in WordPress 3.4, but if you haven't upgraded yet (it's not even out at the time of this writing), you can still apply the fix manually, like so. My dumps wouldn't validate before the fix, but I've confirmed that they now fully validate after.

● ● ●
Artem Russakovskii is a San Francisco programmer and blogger. Follow Artem on Twitter (@ArtemR) or subscribe to the RSS feed.

In the meantime, if you found this article useful, feel free to buy me a cup of coffee below.