Dynadot โ€” .com Registration $8.99

Simple XML Class

Spaceship Spaceship
Watch

Barrucadu

Established Member
Impact
64
Here is a simple class to parse XML files:

PHP:
// Known Bugs:
//	Does not support nesting of complex types eg:
//		<tag><tag2><tag3>hi</tag3></tag2></tag>
//		would display as tag => null, tag2 => tag3 = "hi"
class parseXML{
	// File URL
	var $file;
	
	//Private variables
	var $parser;
	var $data;
	var $index_array;
	var $data_array;
	var $xml;
	
	//Tag variables
	var $root_tags = array();
	var $sub_tags = array();
	
	function presetRSS(){
		// Set up the tags for RSS 2.0
		$this->root_tags = array('title', 'link', 'description', 'language', 'lastBuildDate', 'copyright');
		$this->sub_tags = array('item' => array('title', 'description', 'link', 'guid', 'pubDate', 'category'));
	}
	
	function print_error(){
		// Display xml error message
		die(sprintf("XML Error: %s at line %d", 
			xml_error_string(xml_get_error_code($this->parser)), 
			xml_get_current_line_number($this->parser) 
		)); 
	}

	function parse_file(){
		// Create the parser
		$this->parser = xml_parser_create();
		// Skip whitespace between elements
		xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
		// Disable upper-casing.
		xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, 0);
		// Read the data into $data
		$this->data = file_get_contents($this->file);
		//parse XML input $data into two arrays:
		// $index_array - pointers to the locations of appropriate values in $data_array - data value array.
		xml_parse_into_struct($this->parser, $this->data, $this->data_array, $this->index_array) or $this->print_error();
		
		// Stick it all in a nice array
		$this->xml = array();
		
		// Handle root tags (format: $xml[tag])
		foreach($this->root_tags as $tag){
			if(count($this->data_array[$this->index_array[$tag][0]]['xml:attributes']) != 0){
				foreach($this->data_array[$this->index_array[$tag][0]]['xml:attributes'] as $akey => $avar){
					$this->xml[$tag]['attributes'][$akey] = $avar;
				}
			}
			$this->xml[$tag] = $this->data_array[$this->index_array[$tag][0]]['value'];
		}
		
		// Now for child tags (format: $xml[parent][number][child])
		foreach($this->sub_tags as $key => $value){
			$children = count($this->index_array[$key]) / 2;
			$i = 0;
			$j = 0;
			while ($i < $children){
				// Skip every 2nd one, as it is a closing tag and thus has no attributes
				if($j % 2 == 0 || $j == 0){
					if(count($this->data_array[$this->index_array[$key][$j]]['attributes']) != 0){
						foreach($this->data_array[$this->index_array[$key][$j]]['attributes'] as $akey => $avar){
							$this->xml[$key][$i]['xml:attributes'][$akey] = $avar;
						}
					}
					foreach($value as $subtag){
						if(count($this->data_array[$this->index_array[$subtag][$i]]['attributes']) != 0){
							foreach($this->data_array[$this->index_array[$subtag][$i]]['attributes'] as $akey => $avar){
								$this->xml[$key][$i][$subtag]['xml:attributes'][$akey] = $avar;
							}
						}
						if(is_array($this->sub_tags[$subtag])){
							$this->xml[$key][$i][$subtag] = 'xml:complex_tag';
						}else{
							$this->xml[$key][$i][$subtag] = $this->data_array[$this->index_array[$subtag][$i]]['value'];
						}
					}
					$i ++;
				}
				$j ++;
			}
		}

		//unseting XML parser object
		xml_parser_free($this->parser);
	}
}

Here is an example for parsing RSS 2.0 files (as you can see, there is a function to set the class up for parsing RSS)

PHP:
$xml = new parseXML;

// RSS 2.0 File (only tested with bbc news)
$xml->file = 'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml';
$xml->presetRSS();

$xml->parse_file();
$parsedXML = $xml->xml;

echo '<pre>';
print_r($parsedXML);
echo '</pre>';

(I can't believe i'm on holiday in Poland, and still doing this)
 
Last edited:
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
Unstoppable DomainsUnstoppable Domains
Not bad Michael, better than some I've seen :) If you wouldn't mind, I may modify it a little and post it here ;)
 
0
•••
Go for it, if you could make it better support tag nesting, that would be good.

Here is what happens at the moment:
Code:
<xml>
     <tag>
          <tag2>
               <tag3>Hi</tag3>
          </tag2>
     </tag>
</xml>

PHP:
Array
(
     [tag] => xml:complex_tag,
     [tag2] => Array
          (
               [tag3] => Hi
          )
);

Edit: After checking, it also parses RSS feeds wrong, if you could also fix that, that would be great.

Edit 2: I have made a work around for RSS feeds (ignoring the image <title> and <link> tags)

PHP:
class parseXML{
	// File URL
	var $file;
	
	//Private variables
	var $parser;
	var $data;
	var $index_array;
	var $data_array;
	var $xml;
	
	//Tag variables
	var $root_tags = array();
	var $sub_tags = array();
	var $offsets = array();
	
	function presetRSS(){
		// Set up the tags for RSS 2.0
		$this->root_tags = array('title', 'link', 'description', 'language', 'lastBuildDate', 'copyright');
		$this->sub_tags = array('item' => array('title', 'description', 'link', 'guid', 'pubDate', 'category'));
		// Ignore the image <title> and <link> tags
		$this->offsets = array('title' => 1, 'image' => 1);
	}
	
	function print_error(){
		// Display xml error message
		die(sprintf("XML Error: %s at line %d", 
			xml_error_string(xml_get_error_code($this->parser)), 
			xml_get_current_line_number($this->parser) 
		)); 
	}

	function parse_file(){
		// Create the parser
		$this->parser = xml_parser_create();
		// Skip whitespace between elements
		xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
		// Disable upper-casing.
		xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, 0);
		// Read the data into $data
		$this->data = file_get_contents($this->file);
		//parse XML input $data into two arrays:
		// $index_array - pointers to the locations of appropriate values in $data_array - data value array.
		xml_parse_into_struct($this->parser, $this->data, $this->data_array, $this->index_array) or $this->print_error();
		
		// Stick it all in a nice array
		$this->xml = array();
		
		// Handle root tags (format: $xml[tag])
		foreach($this->root_tags as $tag){
			if(count($this->data_array[$this->index_array[$tag][0]]['xml:attributes']) != 0){
				foreach($this->data_array[$this->index_array[$tag][0]]['xml:attributes'] as $akey => $avar){
					$this->xml[$tag]['attributes'][$akey] = $avar;
				}
			}
			$this->xml[$tag] = $this->data_array[$this->index_array[$tag][0]]['value'];
		}
		
		// Now for child tags (format: $xml[parent][number][child])
		foreach($this->sub_tags as $key => $value){
			$children = count($this->index_array[$key]) / 2;
			$i = 0;
			$j = 0;
			while ($i < $children){
				// Skip every 2nd one, as it is a closing tag and thus has no attributes
				if($j % 2 == 0 || $j == 0){
					if(count($this->data_array[$this->index_array[$key][$j]]['attributes']) != 0){
						foreach($this->data_array[$this->index_array[$key][$j]]['attributes'] as $akey => $avar){
							$this->xml[$key][$i]['xml:attributes'][$akey] = $avar;
						}
					}
					foreach($value as $subtag){
						if(in_array($subtag, $this->root_tags)){
							$k = $i + 1;
						}else{
							$k = $i;
						}
						if(isset($this->offsets[$subtag])){
							$k = $k + $this->offsets[$subtag];
						}
						if(count($this->data_array[$this->index_array[$subtag][$k]]['attributes']) != 0){
							foreach($this->data_array[$this->index_array[$subtag][$k]]['attributes'] as $akey => $avar){
								$this->xml[$key][$i][$subtag]['xml:attributes'][$akey] = $avar;
							}
						}
						if(is_array($this->sub_tags[$subtag])){
							$this->xml[$key][$i][$subtag] = 'xml:complex_tag';
						}else{
							$this->xml[$key][$i][$subtag] = $this->data_array[$this->index_array[$subtag][$k]]['value'];
						}
					}
					$i ++;
				}
				$j ++;
			}
		}

		//unsetting XML parser object
		xml_parser_free($this->parser);
	}
}

Edit 3: If you want to have a look, here is an example with tag attributes:
PHP:
$xml = new parseXML;

$xml->file = 'http://mikor.clearlyhosted.org/webcomics.xml';
$xml->sub_tags = array('comic' => array('name', 'url', 'updated'));

$xml->parse_file();
$parsedXML = $xml->xml;

echo '<pre>';
print_r($parsedXML);
echo '</pre>';
 
Last edited:
0
•••
Here is a slightly improved version:

PHP:
// Known Bugs:
//	Does not support nesting of complex types eg:
//		<tag><tag2><tag3>hi</tag3></tag2></tag>
//		would display as tag => xml:complex_tag, tag2 => tag3 = "hi"
class parseXML{
	// File URL
	var $file;
	
	//Private variables
	var $parser;
	var $data;
	var $index_array;
	var $data_array;
	var $xml;
	
	//Tag variables
	var $root_tags = array();
	var $sub_tags = array();
	var $offsets = array();
	
	function presetRSS(){
		// Set up the tags for RSS 2.0
		$this->root_tags = array('title', 'link', 'description', 'language', 'lastBuildDate', 'copyright');
		$this->sub_tags = array('item' => array('title', 'description', 'link', 'guid', 'pubDate', 'category'));
		
		// Ignore the image <title> and <link> tags
		$this->offsets = array('title' => 1, 'image' => 1);
	}
	
	function print_error(){
		// Display xml error message
		die(sprintf("XML Error: %s at line %d", 
			xml_error_string(xml_get_error_code($this->parser)), 
			xml_get_current_line_number($this->parser) 
		));
		
		// Return false
		return false;
	}

	function parse_file(){
		// Create the parser
		$this->parser = xml_parser_create();
		// Skip whitespace between elements
		xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
		// Disable upper-casing.
		xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, 0);
		// Read the data into $data
		$this->data = file_get_contents($this->file);
		//parse XML input $data into two arrays:
		// $index_array - pointers to the locations of appropriate values in $data_array - data value array.
		xml_parse_into_struct($this->parser, $this->data, $this->data_array, $this->index_array) or $this->print_error();
		
		// Stick it all in a nice array
		$this->xml = array();
		
		// Handle root tags (format: $xml[tag])
		foreach($this->root_tags as $tag){
			if(count($this->data_array[$this->index_array[$tag][0]]['xml:attributes']) != 0){
				foreach($this->data_array[$this->index_array[$tag][0]]['xml:attributes'] as $akey => $avar){
					$this->xml[$tag]['attributes'][$akey] = $avar;
				}
			}
			$this->xml[$tag] = $this->data_array[$this->index_array[$tag][0]]['value'];
		}
		
		// Now for child tags (format: $xml[parent][number][child])
		foreach($this->sub_tags as $key => $value){
			$children = count($this->index_array[$key]) / 2;
			$i = 0;
			$j = 0;
			while ($i < $children){
				// Skip every 2nd one, as it is a closing tag and thus has no attributes
				if($j % 2 == 0 || $j == 0){
					if(count($this->data_array[$this->index_array[$key][$j]]['attributes']) != 0){
						foreach($this->data_array[$this->index_array[$key][$j]]['attributes'] as $akey => $avar){
							$this->xml[$key][$i]['xml:attributes'][$akey] = $avar;
						}
					}
					foreach($value as $subtag){
						if(in_array($subtag, $this->root_tags)){
							$k = $i + 1;
						}else{
							$k = $i;
						}
						if(isset($this->offsets[$subtag])){
							$k = $k + $this->offsets[$subtag];
						}
						if(count($this->data_array[$this->index_array[$subtag][$k]]['attributes']) != 0){
							foreach($this->data_array[$this->index_array[$subtag][$k]]['attributes'] as $akey => $avar){
								$this->xml[$key][$i][$subtag]['xml:attributes'][$akey] = $avar;
							}
						}
						if(is_array($this->sub_tags[$subtag])){
							$this->xml[$key][$i][$subtag] = 'xml:complex_tag';
						}else{
							$this->xml[$key][$i][$subtag] = $this->data_array[$this->index_array[$subtag][$k]]['value'];
						}
					}
					$i ++;
				}
				$j ++;
			}
		}
		
		//unsetting XML parser object
		xml_parser_free($this->parser);
		
		// Return the parsed file
		return $this->xml;
	}
}

PHP:
$xml = new parseXML;

// RSS 2.0 File (only tested with bbc news)
$xml->file = 'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml';
$xml->presetRSS();

$parsedXML = $xml->parse_file();

echo '<pre>';
print_r($parsedXML);
echo '</pre>';
 
0
•••
Appraise.net
Unstoppable Domains
Domain Recover
DomainEasy โ€” Payment Flexibility
  • The sidebar remains visible by scrolling at a speed relative to the pageโ€™s height.
Back