Powerful XPath and XQuery : How to extract RSS feeds from an HTML file

Wow.. XQuery coupled with XPath is really powerful.

With this HTML, how can hrefs of link where its rel attribute is “alternate” be retrieved?

<html xmlns="http://www.w3.org/1999/xhtml">

<head profile="http://gmpg.org/xfn/11">

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

	<title>iPhone Programming Tutorial</title>
	
	<!--&#91;if IE&#93;>
	<style> 
	#content { top:0px;	}
	</style>
	<!&#91;endif&#93;-->
	
	<!--&#91;if IE 6&#93;>
	<style type="text/css">
	.left-content {
		float: left;
		width: 620px;	
	}
	.text-box {
		float: left;
		width: 620px;
		padding: 0 26px 0 27px;
	}
	</style>
	<!&#91;endif&#93;-->
	
	

<meta name="generator" content="WordPress 2.8.1" /> <!-- leave this for stats -->

<link rel="stylesheet" href="http://icodeblog.com/wp-content/themes/bluez/style.css" type="text/css" media="screen" />
	
<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://icodeblog.com/feed/" />

<link rel="alternate" type="text/xml" title="RSS .92" href="http://icodeblog.com/feed/rss/" />

<link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="http://icodeblog.com/feed/atom/" />

<link rel="pingback" href="http://icodeblog.com/xmlrpc.php" />

	<link rel='archives' title='August 2009' href='http://icodeblog.com/2009/08/' />
	<link rel='archives' title='July 2009' href='http://icodeblog.com/2009/07/' />
	<link rel='archives' title='June 2009' href='http://icodeblog.com/2009/06/' />
	<link rel='archives' title='May 2009' href='http://icodeblog.com/2009/05/' />
	<link rel='archives' title='March 2009' href='http://icodeblog.com/2009/03/' />
	<link rel='archives' title='February 2009' href='http://icodeblog.com/2009/02/' />
	<link rel='archives' title='January 2009' href='http://icodeblog.com/2009/01/' />
	<link rel='archives' title='December 2008' href='http://icodeblog.com/2008/12/' />
	<link rel='archives' title='November 2008' href='http://icodeblog.com/2008/11/' />
	<link rel='archives' title='October 2008' href='http://icodeblog.com/2008/10/' />
	<link rel='archives' title='September 2008' href='http://icodeblog.com/2008/09/' />
	<link rel='archives' title='August 2008' href='http://icodeblog.com/2008/08/' />
	<link rel='archives' title='July 2008' href='http://icodeblog.com/2008/07/' />

Proper XQuery is

for $p in /html/head/link
where $p/@rel = "alternate"
return $p/@href

It will return :

href=”http://icodeblog.com/feed/&#8221;
href=”http://icodeblog.com/feed/rss/&#8221;
href=”http://icodeblog.com/feed/atom/&#8221;

Very powerful and very impressive!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: