My sample XML is the following snippet:
Its the Google sitemap.xml. I would like to iterate over the urls in this file to check that there are no errors (no 404s, 500s) in this URL list.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url><loc>http://www.mysite.com/</loc><lastmod>2009-04-09T18:33:21+00:00</lastmod><changefreq>daily</changefreq><priority>1.00</priority></url>
<url><loc>http://www.mysite.com/register.jsp</loc><lastmod>2009-04-09T18:33:18+00:00</lastmod><changefreq>daily</changefreq><priority>0.50</priority></url>
</urlset>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url><loc>http://www.mysite.com/</loc><lastmod>2009-04-09T18:33:21+00:00</lastmod><changefreq>daily</changefreq><priority>1.00</priority></url>
<url><loc>http://www.mysite.com/register.jsp</loc><lastmod>2009-04-09T18:33:18+00:00</lastmod><changefreq>daily</changefreq><priority>0.50</priority></url>
</urlset>
These are the steps to get Java objects in your code representing the above XML.
1. Download Castor from here
2. Create an XSD file for the above XML (there are online schema generator like HIT software
<?xml version="1.0" encoding="utf-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="changefreq">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="lastmod">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="loc">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="priority">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="url">
<xs:complexType>
<xs:sequence>
<xs:element ref="loc" />
<xs:element ref="lastmod" />
<xs:element ref="changefreq" />
<xs:element ref="priority" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="urlset">
<xs:complexType>
<xs:sequence>
<xs:element ref="url" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="xsi:schemaLocation" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="changefreq">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="lastmod">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="loc">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="priority">
<xs:complexType mixed="true" />
</xs:element>
<xs:element name="url">
<xs:complexType>
<xs:sequence>
<xs:element ref="loc" />
<xs:element ref="lastmod" />
<xs:element ref="changefreq" />
<xs:element ref="priority" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="urlset">
<xs:complexType>
<xs:sequence>
<xs:element ref="url" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="xsi:schemaLocation" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:schema>
3. Generate Java objects from the schema using an ANT task
4. Write a simple test case to test the code generation:
FileReader reader = new FileReader(Settings.getInstance("sitemap"));
Urlset urls = Urlset.unmarshal(reader); assertNotNull(urls);
assertEquals(urls.getUrlCount(), 35);
for(int i = 0; i <>
org.castor.sitemap.Url theUrl = urls.getUrl(i);
String currentUrl = theUrl.getLoc().getContent();
System.out.print(currentUrl); assertNotNull(currentUrl);
}
}
P.S. I had a problem with a required attribute. It seems Castor was not mapping correctly xsi:schemaLocation attribute. In my case I modified the generated XSD, still looking at this issue.
No comments:
Post a Comment