Jon Aquino's Mental Garden

Engineering beautiful software jon aquino labs | personal blog

Friday, February 25, 2005

Using lynx to scrape 100 random company names from the web

Repeat this command 100 times to get a list of 100 company names e.g. for test data:

lynx -source "http://adactio.com/extras/newmediagenerator/" | grep h1 >> c:\junk3\g.txt

I learned the lynx/grep trick from one of my favourite books, "The Pragmatic Programmer". That book got me started on the wonderful world of Unix text tools (which are available for Windows in the Cygwin package). That book also turned me on to the XEmacs text editor, which is a dangerous yet powerful tool for manipulating text.

0 Comments:

Post a Comment

<< Home