I'm working on an app where I need to parse some HTML. This is the first time I've had to do screen-scraping with Groovy. After a bit of trial and error I think I'm getting the hang of it. The HTML I'm working with isn't well-formed, so the default Groovy XmlSlurper and XmlParser puke. After some digging I found TagSoup. It "parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short".
It made my parsing much easier. Thanks John Cowan!
A blog (mostly) about nothing by a software engineer that loves to spend time with his family, golf, run, play hockey and discuss all things related to science, politics and our civilization.
Tuesday, December 29, 2009
Groovy XmlSlurper and HTTP 503 Response Code
I struggled a bit when trying to parse some XHTML with Groovy's XmlSlurper (and XmlParser). I was receiving the following:
Caught: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
It turns out that the guys from W3C got sick of dealing with the excessive traffic for their DTDs. So now they return a Service Unavailable (HTTP 503) if they detect parser requests.
To solve the problem I had to set the loading of external DTDs to false. Here's the code.
def slurper = new XmlSlurper()
slurper.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
def results = slurper.parseText(htmlResponse)
Googling for the answer wasn't extremely helpful. This blog post helped (I think it's in Japanese). This post also helped. Thanks guys!
I decided to re-post the solution since it took me awhile googling for the answer.
Caught: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
It turns out that the guys from W3C got sick of dealing with the excessive traffic for their DTDs. So now they return a Service Unavailable (HTTP 503) if they detect parser requests.
To solve the problem I had to set the loading of external DTDs to false. Here's the code.
def slurper = new XmlSlurper()
slurper.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
def results = slurper.parseText(htmlResponse)
Googling for the answer wasn't extremely helpful. This blog post helped (I think it's in Japanese). This post also helped. Thanks guys!
I decided to re-post the solution since it took me awhile googling for the answer.
Labels:
Groovy,
HTTP response code: 503,
XML,
XmlParser,
XmlSlurper
Sunday, December 27, 2009
The Science of Avatar
An interesting read on the science of Avatar. I still haven't seen the movie; just too much going on with the holidays.
Tuesday, December 8, 2009
(Near) Real-Time Analytics
At my new gig, I've been asking whether the team has considered the possibility of using map/reduce or a similar grid-based solution to conduct our analytics in (near) real-time. Interestingly enough, I ran across Nati Shalom's post on real-time analytics yesterday. This should help give me some ammunition to convince everyone that we need to move in this direction for the solution we're building. Thanks Nati!
A Feast for Crows
I just finished re-reading George R. R. Martin's A Feast for Crows. I enjoyed it more than the first time I read it. My favorite still continues to be A Storm of Swords. Now if he'd just publish A Dance with Dragons!
Wednesday, May 13, 2009
The Definitive Guide to Grails
I finished up The Definitive Guide to Grails close to a month ago, but I forgot to blog about it (I'm using my blog to help keep track of which books I've read). It was an excellent read. I'm sold on Groovy and Grails, particularly for Java shops. Given the recent SpringSource purchase of G2One, I expect Groovy and Grails to gain much wider adoption in the enterprise.
Sunday, March 22, 2009
The Productive Programmer
I just finished up The Productive Programmer by Neal Ford. It was so good I decided to buy my own copy. It definitely made me realize how much more efficient I can make myself. There are a ton of tips for both Mac and Windows. One of the major themes was automate everything you can. Thanks Neal!
Friday, March 13, 2009
No Fluff Just Stuff
Today is the first day of the Twin Cities Software Symposium. The first talk I attended was REST: Information Driven Architectures for the 21st Century by Brian Sletten. Very informative. Definitely not an introductory REST talk. I'm curious to hear Brian's Semantic SOA talk later today. A couple things that stuck included Jon Postel's "Be liberal in what you accept, and conservative in what you send."
A couple of things Brian talked about that sound like they're worth investigating include Sinatra, a DSL for building web applications in Ruby and retrievr, which lets you find Flickr photos by creating a sketch of what you're after.
Brian's new gig sounds pretty cool: League of Legends.
A couple of things Brian talked about that sound like they're worth investigating include Sinatra, a DSL for building web applications in Ruby and retrievr, which lets you find Flickr photos by creating a sketch of what you're after.
Brian's new gig sounds pretty cool: League of Legends.
Sunday, March 1, 2009
Einstein
I recently finished reading Einstein: His Life and Universe, a biography by Walter Isaacson. I enjoyed it very much. It gave excellent insight into the man. The most interesting thing to me was that even though he was brilliant, he struggled with things in his everyday life just like everyone else. His family life wasn't perfect and neither were many other aspects of his life. I love the fact he was a non-conformist not only in science, but in political affairs as well.
Saturday, February 14, 2009
Crystal Clear
I just finished up Crystal Clear by Alistair Cockburn. A very good book, but it was a stretch to get it to 300 pages. The first chapter threw me for a loop the way it was structured and the last chapter, a case study, was a dud. While the team size for Crystal should be 8 or less, a case study with 1.5 developers doesn't sound like a very good case study. However, the chapters in between were excellent. Cockburn admittedly structured each chapter differently attempting to cater to different readers. It gave me some insights into how a successful team should interact and was very complementary to the other Agile documentation that I've seen. I definitely liked his guidance on Walking Skeleton and Incremental Re-architecture. It helped me reinforce the concept of an Architectural Slice that I've been conveying to the folks on the large Java project that I'm currently working on.
Kudos to Andy Miller for recommending this book!
Shipped It!
Jared Richardson of Ship It! fame spoke at the TCJUG on Monday. His topic was your career. He started out a bit slow, but the pace picked up as his presentation progressed. I think most folks got quite a bit out of it, but it reminded me of several presentations I had seen before. Particularly there was some overlap with a presentation I went to several years ago by Dave Thomas at NFJS Denver. Dave's theme was about investing in your career. That was the first time I heard Dave's infamous "Herding Racehorses and Racing Sheep".
I did learn some new stuff at Jared's presentation. In particular I learned about qik, which looks pretty cool. It allows you to share a live video feed from your phone; Jared had someone in the audience do the live feed to qik using Jared's iPhone. I liked Jared's acronym for public speaking (L)ock eyes, (I)ntonation, (P)ause as well. This will definitely come in handy for me in the future.
The part that surprised me the most was how few people in the audience new about Blogs and Feed Readers. Another shocker was how few people had heard of The Pragmatic Programmer. Maybe people were just too lazy to raise their hands. If not, c'mon TCJUG attendees!
Thanks for flying all the way to Minnesota to enlighten us Jared!
I did learn some new stuff at Jared's presentation. In particular I learned about qik, which looks pretty cool. It allows you to share a live video feed from your phone; Jared had someone in the audience do the live feed to qik using Jared's iPhone. I liked Jared's acronym for public speaking (L)ock eyes, (I)ntonation, (P)ause as well. This will definitely come in handy for me in the future.
The part that surprised me the most was how few people in the audience new about Blogs and Feed Readers. Another shocker was how few people had heard of The Pragmatic Programmer. Maybe people were just too lazy to raise their hands. If not, c'mon TCJUG attendees!
Thanks for flying all the way to Minnesota to enlighten us Jared!
Saturday, January 31, 2009
Groovy Encapsulation - Say What?
I'm reading Groovy Recipes by Scott Davis and find this troubling:
class Book3{
private String title
private String getTitle(){}
private void setTitle(title){}
}
def b3 = new Book3()
b3.@title = "Groovy Recipes"
println b3.@title
===> Groovy Recipes
In Groovy, private attributes can be modified, even if you use private setters. That's not cool. I'm hoping there's some way to enforce encapsulation, but it's not looking good right now.
class Book3{
private String title
private String getTitle(){}
private void setTitle(title){}
}
def b3 = new Book3()
b3.@title = "Groovy Recipes"
println b3.@title
===> Groovy Recipes
In Groovy, private attributes can be modified, even if you use private setters. That's not cool. I'm hoping there's some way to enforce encapsulation, but it's not looking good right now.
Sunday, January 25, 2009
Groovy!
I just finished up Getting Started with Grails the free book from InfoQ by Jason Rudolph. An excellent book. Thanks Jason! I thought it was so good I decided to pay for it (even though it's free) to help support the author and InfoQ. There's a few discrepancies because the book is almost two years old and quite a bit has changed in Grails since then, but I had very few problems working through the examples with the latest version of Grails.
I'm definitely on the Groovy and Grails learning train. Both still seem very promising to me (being a Java guy) and I'm going to continue investigating both. While I set out to learn more about Ruby and Rails this year, the winds have shifted and I'm now focused on Groovy and Grails. I purchased the PDF version of Scott Davis' book Groovy Recipes and The Definitive Guide to Grails.
I'm definitely on the Groovy and Grails learning train. Both still seem very promising to me (being a Java guy) and I'm going to continue investigating both. While I set out to learn more about Ruby and Rails this year, the winds have shifted and I'm now focused on Groovy and Grails. I purchased the PDF version of Scott Davis' book Groovy Recipes and The Definitive Guide to Grails.
Saturday, January 17, 2009
Hockey Day Minnesota
Today is Hockey Day Minnesota. I'm going to try to honor that by getting my daughters out on the neighbor's pond followed up by taking them to the University of Minnesota Women's game against Bemidji State. It'll then be on to termite practice where the kids will get to play pond hockey instead of practicing. We'll wind it down with watching some of the Gopher and Wild games. It should be a ton of fun!
Code Freeze
I attended the Code Freeze conference at the University of Minnesota on Thursday. It was an excellent local event, especially considering it was an all-day event for only $90. This was the first year I've attended Code Freeze; this year's theme was Maximizing Developer Value. Neal Ford kicked things off and as usual he knocked it out of the park. His topic was On the Lamb from the Furniture Police. It covered the fact that as programmers we're hired to concentrate for long periods of time, yet corporate environments provide the exact opposite affect.
Other speakers included Luke Francl, Nate Schutta, Susan Standiford, Andy Miller and Tomo Lennox. I was very impressed with Nate's presentation, it seemed to directly pick-up where Neil left off. I was particularly interested in Nate's comments about the working of the human brain, as it is an area of interest for me.
I was also intrigued by Andy's presentation entitled "Why I don’t estimate with "points" (and how you too can be delivered from the tedium of repetitive estimation)". Andy and I are currently at the same client working together on a large re-engineering project. I haven't work with Andy for long, but I was very impressed with his presentation and was impressed with his pragmatic approach to estimating. It definitely opened my eyes to new ideas.
Other speakers included Luke Francl, Nate Schutta, Susan Standiford, Andy Miller and Tomo Lennox. I was very impressed with Nate's presentation, it seemed to directly pick-up where Neil left off. I was particularly interested in Nate's comments about the working of the human brain, as it is an area of interest for me.
I was also intrigued by Andy's presentation entitled "Why I don’t estimate with "points" (and how you too can be delivered from the tedium of repetitive estimation)". Andy and I are currently at the same client working together on a large re-engineering project. I haven't work with Andy for long, but I was very impressed with his presentation and was impressed with his pragmatic approach to estimating. It definitely opened my eyes to new ideas.
Saturday, January 10, 2009
Hackers and Painters
One of my 2009 resolutions is to read more. I just finished up Hackers and Painters. I had made it half-way through a couple of years ago and decided to start over and I made it all the way through. A quick and insightful read. Paul Graham is fairly opinionated, which makes for a good read. I've never seen Lisp, but I am very curious if it as good as he claims. Based on my experiences, I definitely agree with his thoughts on development productivity. It makes me that much more interested in learning Ruby and Rails. The last time I worked seriously with a dynamic language was with Perl in college when we were building the Heil X6 SMT's kernel, simulator and assembler. The Heil X6 was the computer we (a team of 6 graduate and undergraduates) built in our ECE 554 Digital Engineering Lab using FPGAs that we had to hand-wire. What a trip!
Groovy and Grails
At the DevJam Jam Session on Wednesday, Mike Hugo gave a presentation on Groovy and Grails. I had heard Scott Davis speak about Groovy and Grails over a year ago and thought it was pretty cool, but back then it didn't feel real. Since then Grails has produced a 1.0 release; put simply I was extremely impressed. Mike is a very good presenter, his slides had very few words and some great pictures. He kept it lively with some humorous slides plus some succinct demos. I've been looking at Ruby on Rails off and on over the last couple of years. One of my New Year's resolutions for 2009 was to really dig into Rails and write a decent sized application. However, after Mike's presentation I began to think that a better investment of my time may be in Grails given that it runs on the JVM and plays very nicely with the Java language. I downloaded Grails yesterday and played around with it a bit. Very nice. Just like with Rails, it's amazingly easy to get a site up and running quickly. Grails is still a 1.0 release, so I have some concerns about it's maturity. However, it runs on the JVM and uses Spring, Hibernate and SiteMesh under the covers. Obviously all of them are very mature.
After the presentation by Mike, we entered the Fish Bowl round table discussion, which I believe is the best part of the Jam Sessions. I'm amazed by the brain power in the room at these sessions and I definitely like the Fight Club mentality. The conversation touched on a number of topics including the advantages / disadvantages of statically vs dynamically typed languages. The consensus seemed to be we're definitely seeing a movement toward dynamically typed languages. Someone even dropped the Lisp quote along the lines of Hackers and Painters ("...we could do that in Lisp back in 1968..."), which was interesting given that I just finished the book. One person, who seemed to know a ton about language theory (he even dropped F#), mentioned that he thinks that the gulf between the two is getting bigger and bigger: static languages are becoming more strongly typed and dynamic languages are/will become more dynamic. It makes sense to me given where I see the Java language headed and makes my desire to learn Ruby and Groovy that much stronger.
After the presentation by Mike, we entered the Fish Bowl round table discussion, which I believe is the best part of the Jam Sessions. I'm amazed by the brain power in the room at these sessions and I definitely like the Fight Club mentality. The conversation touched on a number of topics including the advantages / disadvantages of statically vs dynamically typed languages. The consensus seemed to be we're definitely seeing a movement toward dynamically typed languages. Someone even dropped the Lisp quote along the lines of Hackers and Painters ("...we could do that in Lisp back in 1968..."), which was interesting given that I just finished the book. One person, who seemed to know a ton about language theory (he even dropped F#), mentioned that he thinks that the gulf between the two is getting bigger and bigger: static languages are becoming more strongly typed and dynamic languages are/will become more dynamic. It makes sense to me given where I see the Java language headed and makes my desire to learn Ruby and Groovy that much stronger.
Subscribe to:
Posts (Atom)