The Developer Day | Staying Curious

Mar/09

31

PHP XML formatter tool rewrite

A while ago I blogged about an XML Beautifier Tool which is able to tokenize an XML string and output it in human readable format. Strangely enough I noticed that I get quite a few pageviews from people searching for such a tool. Though this tool may be useful to a lot of people I think it is flawed and has serious issues with formatting, speed and memory consumption.

This inspired me to write a new version of XML formatter. It’s based on a SAX parser which is kind of “ugly” to implement and build around but because of it’s event based nature it’s super fast and has a very low memory footprint. The new version of the formatter shouldn’t peak higher in memory than 200 - 300kb even when the XML files start to weight over a megabyte. It also should not have any problems with indentation because it no longer tries to tokenize the XML itself and uses libxml to do the job. I also tried to make the tool documented and extendable.

It’s usage is really simple. All you have to do is initialize an object of XML_Formatter by passing an xml input stream, xml output stream and an array of options and call the format method. You might wonder why it requests input and output streams instead of a file name or a string. It does that to avoid high memory consumption. Here’s an example of how one might use XML_Formatter:

require('XMLFormatter.php');
$input = fopen("input.xml", "r");
$output = fopen("output.xml", "w+");
try {
    $formatter = new XML_Formatter($input, $output);
    $formatter->format();
    echo "Success!";
} catch (Exception $e) {
    echo $e->getMessage(), "\n";
}

Nevertheless this tool is quite powerful in what it can do  (I was able to format other website’s XHTML or tidied HTML sources) it also has some problems which are not actually related to the formatter but may seem odd to the user. The PHP xml parser does not understand such entities as   or unsescaped ampersands like in ?x=1&y=1. So it’s the user’s responsibility to provide “correct” XMLs to the formatter.

Other than that I hope it will prove useful to someone. Download the latest version of the XML_Formatter.

RSS Feed

16 Comments for PHP XML formatter tool rewrite

The Developer Day » Blog Archive » XML beautifier tool | September 8, 2009 at 2:34 AM

[...] post describes a tool that I wrote long time ago. By now I have published a new refactored version of the XML beautifier which solves a few problems of the original [...]

Aswin Anand | September 28, 2009 at 9:03 AM

Do you have a zip archive of this?

Author comment by admin | September 28, 2009 at 10:16 AM

Hi,

Silly of me. Added a tar.gz archive :)

hakre | January 25, 2010 at 5:50 PM

Thanks for providing the source code. Can you say under which license you released it?

Author comment by Žilvinas Šaltys | January 25, 2010 at 6:05 PM

You’re welcome. No licence. Free for all.

Ken S | February 27, 2010 at 1:10 PM

Thanks a TON! I was trying to figure out my XML doc for about 6hrs. After I used your MAGIC rewrite tool, I had it figured out in about 5 mins!!!

Nick Weavers | October 4, 2010 at 10:04 PM

Very useful, thanks for sharing it. Would be nice to have it work with strings too. I just want to format SOAP headers so passing in a string variable and getting another out would be very convenient.

Rob | October 14, 2010 at 9:46 PM

Shot - this is awesome

Author comment by Žilvinas Šaltys | October 14, 2010 at 10:42 PM

You can use it with strings. All you need to do is convert your strings to streams:

$input = fopen(‘data://text/plain,’ . $input, ‘r’);
$output = fopen(“php://temp”, ‘w+’);

$formatter = new XML_Formatter($input, $output);
$formatter->format();

rewind($output);
$result = htmlspecialchars(stream_get_contents($output), ENT_QUOTES);

Eric M | November 19, 2010 at 1:27 PM

This is the only thing I’ve seen to do what I needed. Thanks a ton!!! So easy to use too, even for a novice.

Tommy K. | November 26, 2010 at 10:49 AM

Excuse me for my *maybe* noob request, but can you make this tool available with a textbox input like the older version (paste the code hit the button and it’s beautifully formatted), I’m working as a front-end developer and sometimes I need to make minor modifications in xsl files which are horribly formatted, and this would be a really big help.

Much appreciated your work!

Author comment by Žilvinas Šaltys | November 26, 2010 at 10:51 AM

This is actually your lucky day. One of my colleagues recently needed such a tool and I’ve created it for him but never published it. You can find it here: http://www.thedeveloperday.com/tools/beautyXML2/ Let me know if it works ok for you.

Tommy K. | November 26, 2010 at 11:08 AM

Yeap, it does what I need beautifully, would you make this available for download?

Jon | October 16, 2011 at 12:47 PM

Great effort Žilvinas - thank you.

I am finding that it removes comments, which I would like to preserve. Also, it replaces self-closed tags with a manual close - I’d like to respect the choice of the original XML.

I’ll have a hack about to see if I can sort these things out, but do let me know if there’s any known fixes for them :)

Jon | October 16, 2011 at 5:24 PM

I’ve got a fix for the close tag issue. I suspect this works fine on your install, since you are probably using libxml2; since I am using Expat, the tag is not consumed until after the start_element_handler is called, and accordingly the _open value is wrongly set.

I will push my fix to Github in due course.

XMLFormatter2 | Crowdedplace | January 16, 2013 at 1:41 PM

[...] is a XML beautifying tool deriving from XML_Formatter originally developed by Žilvinas Šaltys. If you are in need of handling big files you should still resort to that solution, since it reads [...]

Leave a comment!

<<

>>

Find it!

Theme Design by devolux.org