Saturday, June 5, 2010

Java vs. Python: fetching URLs

Here we go, another Java vs. Python comparison (I just can't help myself). This time it's about standard library usefulness in doing certain tasks. Fetching the contents of a URL should be a trivial one, but in Java, it's not. Especially if the contents of that URL are gzipped and use a nice charset such as UTF-8.

Java:

URL url = new URL("http://www.example.com/"); URLConnection conn = url.openConnection(); conn.connect(); InputStream in; if (conn.getContentEncoding().equals("gzip")) { in = new GZIPInputStream(conn.getInputStream()); } else { in = conn.getInputStream(); } String charset = conn.getContentType(); BufferedReader reader; if (charset.indexOf("charset=") != -1) { charset = charset.substring(charset.indexOf("charset=") + 8); reader = new BufferedReader(new InputStreamReader(in, charset)); } else { charset = null; reader = new BufferedReader(new InputStreamReader(in)); } StringBuilder builder = new StringBuilder(); String line = reader.readLine(); while (line != null) { builder.append(line + '\n'); line = reader.readLine(); } String content = builder.toString(); // FINALLY!

Python:

content = urllib2.urlopen("http://www.example.com/").read()

At first I though yeah, well, Java is probably older and wasn't designed to do such things very often. I was wrong. Java appeared in 1995, Python in 1991.

2 comments:

  1. In retrospect, Java was heavy in implementation from other languages and Python was probably wasn't too mature for such a vast library. You can't treat programming languages like children who study right when they're born. I doubt python at the time had a function for fetching URL content, let alone popular usage of the HTTP header content type gzip.

    ReplyDelete
  2. I would think that the underlying python code does something similar. I could easily wrap the java code in a function.

    ReplyDelete