Tuesday, December 29, 2009

Groovy XmlSlurper and HTTP 503 Response Code

I struggled a bit when trying to parse some XHTML with Groovy's XmlSlurper (and XmlParser). I was receiving the following:

Caught: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

It turns out that the guys from W3C got sick of dealing with the excessive traffic for their DTDs. So now they return a Service Unavailable (HTTP 503) if they detect parser requests.

To solve the problem I had to set the loading of external DTDs to false. Here's the code.

def slurper = new XmlSlurper()
slurper.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
def results = slurper.parseText(htmlResponse)

Googling for the answer wasn't extremely helpful. This blog post helped (I think it's in Japanese). This post also helped. Thanks guys!

I decided to re-post the solution since it took me awhile googling for the answer.

1 comment:

Anonymous said...

10x! Found your posting by Google