php|architect’s Guide to Web Scraping with PHP

This book is no longer published or available for sale. See our Main Book page for a list of available books.

Despite all the advancements in web APIs and interoperability, it’s inevitable that, at some point in your career, you will have to “scrape” content from a website that was not built with web services in mind. And, despite its sometimes less-than-stellar reputation, web scraping is usually an entire legitimate activity—for example, to capture data from an old version of a website for insertion into a modern CMS.

This book, written by scraping expert Matthew Turland, covers web scraping techniques and topics that range from the simple to exotic using a variety of technologies and frameworks:

Understanding HTTP requests
The PHP HTTP streams wrapper
cURL
pecl_http
PEAR:HTTP
Zend_Http_Client
Building your own scraping library
Using Tidy
Analyzing code with the DOM, SimpleXML and XMLReader extensions
CSS selector libraries
PCRE pattern matching
Tips and Tricks
Multiprocessing / parallel processing

Book Details

Title	php\|architect's Guide to Web Scraping with PHP
ISBN	978-0981034515
Pages	192
Digital Formats	PDF, ePub, Mobi
Author	Matthew Turland
Date Published	September 1, 2010
Print Dimensions	7.5" x 9.2"
Language	English

php|architect’s Guide to Web Scraping with PHP

Book Details

About us

Policies & legal

Online Store

Special sections