Web Scraping

web scraping is a technique of extracting required information from web page. It can be done by visiting the web pages and copy pasting it. This is of course not the proper way if you have thousands and data to extract. But web scraping has been made easy by using some scripting. Here I have an example of web scraper which can scrap the name of top 250 movies of IMDB rating.

This scraper is made in python using “Mechanize” and “BeautifulSoup” modules. Mechanize module is used to browse web pages in python. We can use “urllib” instead of “Mechanize” but “urllib” can not handle antibot/robots.txt so “Mechanize” module provide some extra features than “urllib”. Since imdb.com has antibots, we cannot browse it using “urllib”. “BeautifulSoup” module is one of the popular and feature rich module to pars HTML contents.

You can download my example from here and you must have those two modules to run this script.