HTML Parsing With Groovy and TagSoup
I'm working on an app where I need to parse some HTML. This is the first time I've had to do screen-scraping with Groovy. After a bit of trial and error I think I'm getting the hang of it. The HTML I'm working with isn't well-formed, so the default Groovy XmlSlurper and XmlParser puke. After some digging I found TagSoup. It "parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short".
It made my parsing much easier. Thanks John Cowan!
Labels: Groovy, Screen-Scraping, TagSoup, XmlSlurper

