Encrypt in nodejs and decrypt in php or vice versa

If you are trying to encrypt in php and decrypt in nodejs or vice versa, first thing you need to know is only mcrypt_rijndel_128 is identical with aes-128-cbc. So you must use mcrypt_rijndel_128 for php and aes-128-cbc for nodejs. Crypto module for nodejs provides wide verity of encryption/decryption algorithm. The other thing you should know is key and initializing vector (iv) should be same in php and nodejs.

encrypting in nodejs:

var crypto = require('crypto');
var key = 'MySecretKey12345';
var iv = '1234567890123456';
var cipher = crypto.createCipheriv('aes-128-cbc', key, iv);

var text = 'plain text';
var encrypted = cipher.update(text, 'utf8', 'binary');
encrypted += cipher.final('binary');
hexVal = new Buffer(encrypted, 'binary');
newEncrypted = hexVal.toString('hex');

console.log('Encrypted: ', newEncrypted);

//output: 9eb6c9052d1de4474fb52d829360d5af

Decrypting in php:

private $encryptKey = 'MySecretKey12345';
private $iv = '1234567890123456';
private $blocksize = 16;

public function decrypt($data)
{
    return $this->unpad(mcrypt_decrypt(MCRYPT_RIJNDAEL_128,
        $this->encryptKey,
        hex2bin($data),
        MCRYPT_MODE_CBC, $this->iv), $this->blocksize);
}

private function unpad($data, $blocksize)
{
    $len = mb_strlen($data);
    $pad = ord( $data[$len - 1] );
    if ($pad && $pad < $blocksize) {
        $pm = preg_match('/' . chr( $pad ) . '{' . $pad . '}$/', $data);
        if( $pm ) {
            return mb_substr($data, 0, $len - $pad);
        }
    }
    return $data;
}

//echo $this->decrypt('9eb6c9052d1de4474fb52d829360d5af')
// output:  plain text

Full version in gist.

Scraping – Nodejs Vs Php

One more example of screen scraping using Nodejs and Php. This is more of a benchmark test than example. Task was simple, get all the team names of fantasy premier league from first 200 pages. So there were altogether 200 requests (one request per page)
Url for first page was http://fantasy.premierleague.com/my-leagues/303/standings/?ls-page=1
External library used for Nodejs was cheerio and PhpQuery for Php.

Nodejs

var start = +new Date();
var request = require('request');
var cheerio = require('cheerio');
var total_page = 200;
var page = 1;
var header = ['', 'Rank', 'Team', 'Name', 'Point', 'Total'];

console.log("Page number, Time taken");

while (page <= total_page) {
    var url = 'http://fantasy.premierleague.com/my-leagues/303/standings/?ls-page='+page;
    request(url, (function(i) {
        return function (error, response, body) {
            $ = cheerio.load(body);
            $('.ismStandingsTable').find('tr').each(function(index, elem){
                $(this).find('td').each(function(head){
                    if (head == 2) {
                        //console.log(header[head]+ ' : '+$(this).text());
                        //console.log($(this).text());
                    }
                });
            });
            var end = +new Date();
            console.log(i +", "+(end-start)/1000);
        }
    })(page)); //bind everything with page number
    page++;
}

PHP

<?php
require('phpQuery/phpQuery.php');

$time_start = microtime(true);
$total_page = 200;
$page = 1;
$header = array('', 'Rank', 'Team', 'Name', 'Point', 'Total');

echo ("Page number, Time taken");
while($page <= $total_page) {
    $doc = phpQuery::newDocumentFileHTML('http://fantasy.premierleague.com/my-leagues/303/standings/?ls-page='.$page);
    foreach (pq('.ismStandingsTable tr') as $data) {
        foreach (pq('td', $data) as $key => $val) {
            if ($key == 2) {
                //print pq($val)->text();
            }
        }
    }
    $time_end = microtime(true);
    $execution_time = $time_end - $time_start;
    echo ("\n".$page.", ".$execution_time);
    $page++;
}

?>

Nodejs took 175.535 sec to complete where as Php took 711.790 sec to complete. Php was four times slower than Nodejs.

Here is the graph of page by page and time taken to complete task for each request.
nodejsvsphp

Updated (Nov 14, 2013)
After reading this post so called codswallop i came to know that there exist a php library reactphp which makes php behave asyncronously. This is definitely going to improve efficiency while scraping. Since PhpQuery uses file_get_contents, it has to wait till it gets response from each request. So i borrowed this code. It uses reactphp and phpquery just for parsing html and the task was same as above.
Great! Php took just 39.351 sec to complete.
nodejs_reactphp_phpquery

Then somebody from HN pointed out that default maxsocket connection of nodejs is 5.
It should be cranked up to 64 for fair comparison. So i changed its value to 64.

require(‘http’).globalAgent.maxSockets = 64;

reactphp_nodejsmaxsocket-64
Run time are almost same. Php took 39.351 sec to complete and Nodejs took 37.67. I’m not sure about my bandwith, when i pinged some random europian server from my location using speedtest.net it showed me 9.12Mbps (download) and 4.45Mbps (upload).
The runtime of php has improved. Difference is just by 2 sec, not four times like i said earlier.
If somebody come up with better option to improve efficiency of my code i would be happy to play around with it.
And yes this article is about which is fast nodejs or php just for scraping.
As long as same thing can be achieved using different platforms people are going to compare. Either its “Nodejs vs PHP” or “Nodejs vs Python” or “Nodejs vs Scala” or whatever.
I selected platform which i was comfortable with and took best possible library (that i know) and did my test.
Clearly phpquery wasn’t the best choice, reactphp does better job but i don’t see anything wrong with title here.
If someday someone pops up and says “Here, try this library OverReactPhp. It makes your code much faster”. I would be happy to do my test again.
GitHub repo

44 : php not found [zend cli tool error : solved ]

Everything was working fine until I changed my OS to Ubuntu 10.04 from Debian 5.0. I had my lampp installed in /opt. Although lampp was running, zend cli could not find php. I did exactly what the Getting started Tutorial said.

  • Extract zend framework
  • Edit .bashrc file
  • uncomment following line :

if [ -f ~/.bash_aliases ]; then
. ~/.bash_aliases
fi

  • if there is already .bash_aliases in your home folder add the following line if not then create and add the following line

alias zf = ‘path/to/zendframework/bin/zf.sh’

But when you try to check zend version “zf show version”, error pops out “44: php not found”.
The missing package is “php-cli” which is use to test php script from terminal/shell. Install “php-cli” from your package manager. After installation restart lampp (if needed). Now open terminal and check zend version, “zf show version”. You should see the version of zend.

Web Scraping

web scraping is a technique of extracting required information from web page. It can be done by visiting the web pages and copy pasting it. This is of course not the proper way if you have thousands and data to extract. But web scraping has been made easy by using some scripting. Here I have an example of web scraper which can scrap the name of top 250 movies of IMDB rating.

This scraper is made in python using “Mechanize” and “BeautifulSoup” modules. Mechanize module is used to browse web pages in python. We can use “urllib” instead of “Mechanize” but “urllib” can not handle antibot/robots.txt so “Mechanize” module provide some extra features than “urllib”. Since imdb.com has antibots, we cannot browse it using “urllib”. “BeautifulSoup” module is one of the popular and feature rich module to pars HTML contents.

You can download my example from here and you must have those two modules to run this script.