Scraping the Web With Goutte

Some months back I read the article php|architect’s guide to Web Scraping with PHP – Don’t let the title fool you. It was a review for the book of Mathew Turland ( @elazar ) by Cal Evans . But I didnot read the book too . Days passes I saw Goutte a simple web scrapper by @fabpot (  Fabien Potentier ) . But two days back I was looking how I can download my friends contact details from facebook. I didnot find a means to get the mobile numbers or email address. I looked the API , but it didnot provide some functionalities like that . Then what to do ? If I have got the email and phone number I can sync with my gmail and via it to my android handset . But I didnot find anything , may be other softwares will be there . But I simply thought of learning scraping than going to do something . Goutte is a simple library which can be downloaded as a phar file also. So you just want to add a require for the phar file . It utilises some of the symfony components and zend http component . You can login , select links , clicks on it , filter the content , fetch data etc. I was able to make the script going through some of the articles below . Read me of https://github.com/fabpot/Goutte http://www.phparch.com/2010/04/four-new-php-5-3-components-and-goutte-a-simple-web-scraper/ http://fabien.potencier.org/article/42/parsing-xml-documents-with-css-selectors Also the api :) . Its really a powerful php scraping tool. Yes its easy if you know how the css selectors work . Download Goutte over https://github.com/fabpot/Goutte or using wget https://raw.github.com/fabpot/Goutte/master/goutte.phar . Start scraping :) .

goutte, php, scraping