The Scrape With Claude Playbook
I'm Chris. I do marketing and GTM engineering, and scraping with Claude is the thing people ask me about most, so I wrote it all down. If you're here, I'm assuming you want to get better with Claude, and you're probably in a sales, marketing, or go-to-market kind of role. I promise, if you follow these steps and just put in the work, you'll do things you never knew were possible. And it's fun. I hope this is sincerely helpful. I've left nothing out.
You sell booking software to golf courses.
There are about 16,000 golf courses in the United States. Some of them are your customers. A few thousand of them should be. Somewhere in that 16,000 is every deal you'll close this year, and you don't know which ones they are.
Here's what you actually need. The courses still running an outdated tee-time system. The name of the GM or owner. A work email that doesn't bounce.
All of it is public. It's sitting on their own websites and in public records right now, today. Nobody hid it. The problem was never access. The problem was that reading 16,000 websites used to be a job for an intern army.
It isn't anymore. This is what comes back when you point Claude at it:
| course | state | type | contact | |
|---|---|---|---|---|
| Pebble Beach Golf Links | CA | Resort | ✓ found | ✓ verified |
| Bethpage State Park | NY | Municipal | ✓ found | ✓ verified |
| Erin Hills | WI | Daily Fee | ✓ found | ✓ verified |
| Winged Foot Golf Club | NY | Private | ✓ found | no public email |
| Chambers Bay | WA | Municipal | ✓ found | ✓ verified |
No intern army. You describe what you want, Claude reads the websites, and rows like these stack up while you do something else. Set it running overnight and you wake up to the whole market in a spreadsheet. I've enriched 5,000 companies overnight for about $12. Verified work emails cost me about three cents each. Those aren't projections, they're receipts from real runs, and the exact steps behind both numbers are in this book.
This playbook is everything I know about doing this. Three reasons to scrape, told through one story start to finish. A toolkit of seven ways to get data off any website, for when sites cooperate and when they don't. Three copy-paste playbooks: find the companies, find the contacts, run the outbound. And a five-minute setup that leaves you with a working scraping machine. No gate on any of it.
pebblebeach.com, captured June 2026. One of the 16,000. Everything this book pulls is sitting on pages like this. Shown for editorial illustration only.
What you need
Everything in this book runs in a terminal with Claude Code, which sounds scarier than it is. A regular Claude subscription is all you need. A few free tools help along the way, and a couple of paid ones show up in the later sections, but nothing costs a dime to start. Full setup steps are in Quick Start, and they take about five minutes.
People scrape for three reasons. To build a list. To find a signal. To feed their outbound. That's the whole taxonomy, and most people only ever use the first one.
We're going to do all three with one story: you sell booking software to golf courses. Same market, same data, start to finish.
And here's what we're not going to do. We're not going to teach you to write code. We're not going to make you read technical documentation. That stuff is irrelevant now. By the end of this book you'll be genuinely dangerous scraping with Claude Code, and you'll have done it in plain English.
1. The List
Every outbound motion starts the same way: who are we even talking to?
The old answer was buy the list. Pull an export from one of the big sales databases, pay per contact, get the same rows every competitor already bought. The data is months old, the titles are wrong, and the "golf" filter returns driving ranges, mini golf, and a guy who sells golf carts.
The scraped answer is different. You build the list from the source, and the shape of the work is always the same three steps.
Companies first. Get every golf course in the country into one table. Name, address, city, state, website. There are a ton of ways to find companies: scrape Google Maps, industry directories, association member lists, anywhere your market shows up online. Claude cleans up the duplicates and filters out whatever junk gets caught in the net.
Then people. For each course, who runs it? Course websites have staff pages, "meet the team" pages, contact pages. Claude reads each one and pulls the GM, the director of golf, the owner. No staff page? There are fallbacks, and they're in the toolkit.
Then contact info. A name without an email doesn't get you anywhere. Finding emails cheaply comes down to one rule: free methods first, paid methods last, verify everything. Start by scraping the course's site for published emails. Then work out the company's email pattern and guess the rest. Most companies use one pattern, like first.last@, so a single real email tells you how to spell everyone else's. Only when the free methods miss do you pay for a lookup, and that costs pennies. And every address gets verified before it touches your CRM, no matter where it came from. All in, I land around three cents per verified contact. The full recipe is later in this book, prompts included.
| # | course | state | type | booking system | gm name | verified | |
|---|---|---|---|---|---|---|---|
| 1 | Pebble Beach Golf Links | CA | Resort | @pebblebeach.com | ✓ | ||
| 2 | Bethpage State Park | NY | Municipal | @parks.ny.gov | ✓ | ||
| 3 | Erin Hills | WI | Daily Fee | @erinhills.com | ✓ | ||
| 4 | Winged Foot Golf Club | NY | Private | · | no public email | · | |
| 5 | Chambers Bay | WA | Municipal | @chambersbay.com | ✓ | ||
| 6 | Pinehurst Resort | NC | Resort | @pinehurst.com | ✓ | ||
| 7 | Torrey Pines Golf Course | CA | Municipal | @sandiego.gov | ✓ |
Sample rows. Names, emails, and booking system vendors masked. The highlighted column is the signal field from the next chapter.
What comes out the other end is technically just a spreadsheet. But load it into your CRM and it's something bigger: the foundation of your entire go-to-market engine. Sales calls it. Marketing emails it, mails it, and runs ads against it. Every campaign you launch for the next year sits on top of this data, and quality data in means quality pipeline out. Nobody else has this exact spreadsheet, and that gap is the whole reason the rest of this book works.
One evening of this beats a month of an SDR copy-pasting from Google. That's not a knock on SDRs. It's the whole point. People should do the part people are good at, which is the next chapter.
2. The Signal
A list tells you who exists. A signal tells you who to call this week, and what to say when they answer.
Here's the one that matters in our golf story. Every course that takes online tee times runs some booking system, and their own website tells you which one. The booking widget, the page their tee sheet loads from, the fine print at the bottom of the reservation page. You don't have to guess and you don't have to ask. The site announces it to anyone who looks.
The fine print at the bottom of a booking widget. The vendor name is masked here. On the real page it's printed in full, on every course, for anyone who looks.
If you sell booking software, sit with that for a second. The single most important fact about a prospect, whether they run the legacy system you replace or the modern one you lose to, is printed on their own website. Scrape that one field across all 16,000 courses and your flat list becomes a ranked pipeline. Courses on dying systems float to the top. Courses that just signed with your competitor drop out before you waste a sequence on them.
That's what a real signal is, and it's worth naming because the lead-gen world is full of fake ones. The "personalization" industry will sell you an opener like "saw you went to UW, go Huskies." That's not a signal. That's proof you ran a LinkedIn scrape, and the prospect knows it the second they read it. A real signal is a fact about their business that changes what you would say to them. One is flattery. The other is relevance. Relevance is the only one that books meetings.
And here's where it gets better. The same scrape that reads the booking system can read what kind of course this is. The industry segments courses into types: Daily Fee, Municipal, Resort, Semi-Private, Private, Multi-Course Operators. A course's website tells you which one it is. The rate card, the "membership" tab or the lack of one, the wedding venue page. Claude sorts all 16,000 into those buckets as part of the same pass.
Now, why bother? Because the type changes the pitch completely.
A daily-fee or municipal course lives on filled tee sheets. Their KPI is booking dollars, and their nightmare is Tuesday at 2pm with nobody on the first tee. Your pitch there is yield: fill the empty slots, capture the last-minute cancellations, more confirmed dollars per day.
Now take that exact pitch to a private club and watch it die. Members already paid. The tee sheet isn't a revenue lever, it's a member benefit. Pitch a private club on "fill your empty slots" and you've announced you don't know what business they're in. Their KPI is member experience and retention, so that's the pitch.
A resort cares about the guest's whole stay and the package upsell. A semi-private course is a blend of both worlds. A multi-course operator cares whether your thing works across nine properties without nine logins.
Same product. Six different cold emails.
Notice what just happened, because this is the most important paragraph in the book. The scrape did the part that scales: it read 16,000 websites and sorted every course into a bucket. What it didn't do is know what each bucket cares about. The difference between how a private club thinks and how a municipal course thinks is not something you should trust AI to figure out for you, at least not yet. That's where you come in. You're the expert on your market. Your understanding of the customer is what turns a pile of scraped data into a pipeline, and there is no substitute for it. The scrape gets you to the doorstep. You can't outsource knowing which door you're standing at.
Everyone selling AI lead-gen right now is selling the opposite story, that the tool does it all. The tool does the reading. You do the knowing. When you bring real understanding of your market, scraping stops being a data chore and becomes the most unfair advantage in your stack.
Same move works anywhere, by the way. HVAC companies, law firms, anyone. Want to see another example? I wrote up the same technique on dentists.
3. The Outbound
So you have the list and you have the signal. Sixteen thousand courses, typed and ranked, with names and verified emails. Most people pour that into a cold email tool and call it done.
That's one channel. The same table powers four.
Cold email. The obvious one, but the signal changes what you send. You're not blasting 16,000 courses with one template. You're sending the yield pitch to daily-fee courses on legacy systems, the member-experience pitch to private clubs, the multi-property pitch to operators. Six small campaigns that each read like you know them, because you do.
Hi {{first_name}},
{{course_name}} is a {{course_type}} course, and for courses like yours, Tuesday afternoon tee sheets are the hardest dollars to recover.
We help daily-fee courses on {{booking_system}} capture last-minute cancellations and fill the empty slots automatically, without discounting your walk-up rate.
Want a 90-second video of it working on {{booking_system}}? I'll send it over, no call needed.
The yield pitch from the signal chapter, as a campaign. Every colored chip is a field the scrape filled in.
Call lists. Your BDRs stop dialing alphabetically. The list is ranked by signal, so they call courses on dying systems first, and the opener writes itself: they already know the system, the course type, and what that type cares about. Put those three facts at the top of every call sheet. The first 15 seconds of the call stops being "who are you" and starts being "how did you know that."
Direct mail. Email gets ignored at the top of the market, but a GM opens a FedEx envelope. The scrape already grabbed each course's logo and photos, so use them. Imagine opening a piece of mail and seeing your own logo and a photo of your own course inside. That's the piece that doesn't hit the trash. Logo enrichment is one extra field in the scrape, and it's the difference between mail that looks like marketing and mail that looks like it was made for you.
Paid audiences. Upload the contact list to the ad platforms as a custom audience and your ads only show to the 16,000 people who can actually buy. No targeting guesswork, no spray. You're also handing the pixel a head start: instead of burning budget while Meta or Google slowly learns who your buyer is, it starts from your exact list. Faster learning, cheaper clicks, better return on the same spend. Run it as air cover so that when your email lands, your name is already familiar.
Four channels, one table, one scrape. The list is the asset. The signal is the aim. The outbound is everything it touches.
One thing to know going in: this is not a one-and-done. Lists go stale, sites change, new signals become worth adding. The machine compounds as long as someone keeps it running.
Part One was why. This is how.
Seven ways to get data off a website, ordered from easiest to heaviest. The order matters: it's the order you should try them on any new site. Most of the web gives up its data to the first one. You only escalate when a site fights back, because each step down the list adds a little cost or setup.
This is the one part of the book where I slow down and get precise, because once you understand what each of these actually does, you'll know which one to reach for without asking anyone. Each section is the short version: what it is, when to use it. The full walkthrough, with complete prompts you can copy and paste, lives on its own page, linked at the end of each section.
Read the HTML
Every website is just code. Your browser downloads that code and paints it into the page you see. And Claude is extremely good at reading code.
Most websites will hand over their code to a single direct request, the same request your browser makes when you visit. So the method is three steps. Grab the page's code. Trim out the obvious junk, the menus and ads and styling. Then hand what's left to Claude and ask for exactly what you want.
That ask is where your customer knowledge starts earning. You're not limited to the obvious fields. Brief Claude like you'd brief a sharp assistant: "Read this golf course's website and tell me which booking software they use, whether the course is private, municipal, or resort, and grab the name, address, and phone number." You decide what's worth knowing, because you know the market. Claude just goes and reads.
This is your workhorse. It covers more of the web than you'd expect, it costs nothing, and it doesn't break when a site changes its layout, because Claude reads pages by meaning, not by position. Old scrapers were rigid instructions like "grab the third box from the top," and they shattered on every redesign. Claude reads the page like a person. Most of the golf machine in Part One runs on this one method.
Full walkthrough: Turn any website into a clean list
Bring a real browser
Some websites send you an empty shell of code, and then JavaScript, code that runs inside your browser, builds the actual content a moment later. Booking calendars work this way. So does anything with a loading spinner. A direct request downloads the JavaScript but never runs it, so what you get back is the empty shell. No data.
The fix: Claude can drive a real web browser. It opens the page the way you would, the JavaScript runs, the content appears, and then Claude reads what's actually on screen. Slower than a direct request, and it needs a small one-time setup, which is exactly why it's second on the list and not first.
Same page twice. The direct request gets the spinner. The real browser waits for the JavaScript, then reads the loaded tee sheet.
Full walkthrough: Scrape the sites that need a real browser
Catch the JSON
Right click any web page and hit Inspect. That messy panel that pops open is your browser's developer tools, and it's a goldmine if you know where to look.
Click the Network tab and reload the page. You're now watching every file the page downloads as it builds itself. Most of it is noise, images and fonts and styling. But on data-heavy sites, somewhere in that list is a file in a format called JSON. JSON is how websites store data before dressing it up for display: clean, labeled, structured. The pretty page is for you. The JSON is for the site's own code. Both arrive in your browser, and nothing stops you from reading the neat version.
Grab that file's address, hand it to Claude, and you skip page-reading entirely. The data comes back already structured and complete. And here's the kicker: the JSON is often richer than the page. Sites routinely send more fields than the designer chose to display, which means the JSON can hold data you literally cannot see in the browser.
Full walkthrough: Get the data a site loads behind the scenes
The backlink trick
Some software products give every customer their own web address. A booking tool might host each course's tee sheet at a subdomain like yourcourse.bookingtool.com, or at a unique page like bookingtool.com/book/yourcourse. The moment a vendor does that, their entire customer base becomes public record, because the SEO industry built indexes, tools like Ahrefs, that track every address on the web and who links where.
So you search the index backwards. "Show me every subdomain of bookingtool.com." Out comes the customer list. Not a sample of it. The whole base.
The catch: this only works when the vendor hands out customer subdomains or unique links, so check before you commit. Find one course you know runs the platform, look at where their booking page actually lives, and if their name is in the web address, the trail exists.
An Ahrefs-style export. bookingtool.com is a placeholder vendor, the course subdomains are illustrative. The shape is exactly what comes back: the whole customer base in one pull.
Full walkthroughs: Build a competitor's customer list and Find a website's hidden pages
When a site doesn't like bots
Nothing in this book is sneaky. You're reading public pages. But some websites block anything that looks automated, even when a human could read the same page freely. You make a direct request, you get a wall.
Luckily there are paid services built for exactly this. They route your request through normal residential internet connections, so it arrives at the site looking like a regular visitor, and the site answers normally. It costs a few cents per page.
Full walkthrough: Reach the sites that try to block you
Cheap models at scale
This one isn't about getting data. It's about not overpaying to process it.
When you're reading one page, use Claude and don't think twice. When you're classifying 16,000 pages, most of that is easy, repetitive work. "Is this course private or public" is not a hard question, and you don't need a frontier model to answer it 16,000 times. There are small, fast, open-source models that answer easy questions for fractions of a penny per page, and services like OpenRouter put hundreds of them behind one account.
The pattern: build the scrape with Claude, then let the cheap model run the bulk of it. Claude does the thinking, the workhorse model does the repetition, and Claude comes back for the judgment calls.
Rent a prebuilt scraper
Sometimes the scraper you need already exists. Apify is a marketplace of ready-made scrapers for the big public sources, Google Maps, LinkedIn, Indeed, and thousands more. Renting one costs a few dollars per run, and for mainstream sources it's the fastest first draft of a company list money can buy.
Six of the thousands. If the source is big and public, the scraper is already built and rented by the run.
One rule: always use their filters. Pull "golf courses in Texas," never "everything, I'll sort it out later." I've watched an unfiltered run come back three-quarters junk.
The toolkit is ingredients. Here are three full recipes, and they match the three steps you watched in Part One: find the companies, find the contacts, run the outbound. Chain them together and you've built the machine.
Playbook one: find the companies
Goal: every company in your market, in one spreadsheet. For most markets the fastest source is Google Maps, and you don't have to build anything, because the scraper already exists: the Google Maps scraper on Apify.
Two rules before you spend a dollar.
Companies only, never contacts. Apify scrapers will offer to chase down emails and contact info. Don't. It's expensive and the quality is poor. Pull the company name, address, website, and category, and let the next playbook handle contacts for a fraction of the price. Under 2 cents per company is the bar.
Small run first. Scrape your home county before you scrape the country. The first run always catches junk. Search "golf courses" and you'll get golf cart dealers, driving ranges, and a mini golf bar with great reviews. A small run costs pennies and teaches you exactly which categories to filter out, so the big run comes back clean instead of 30% noise. Burn $1 learning, not $50.
When the results land, hand the file to Claude: dedupe it, drop the junk categories you found in the test run, and flag any company with no website, because you can't enrich what you can't read. That spreadsheet is The List from Part One, step one done.
Playbook two: find the contacts
Takes the company list and gives you back verified work emails for about three cents each. Most marketers go to Apollo, Hunter, or Clay for this, where the going rate is ten to fifteen cents per verified contact. But those tools aren't magic. Most of their contacts are pulled from publicly available sources, and with Claude and a little creativity you can find the same contacts for a third of the cost.
The waterfall is dead simple. Try the cheapest option first, escalate only when it misses. The expensive options sit at the bottom because most contacts never need them.
Stage one is Perplexity Sonar, an AI model with web search built in. Give it the company and the title you want. It comes back with a name and, often, the actual email. Half a cent a call, and it finds most people on its own.
If Sonar comes up empty, scrape the company's own site. Read the pages most likely to hold contacts: /about, /team, /leadership, /contact. Free, and it works for most websites.
Now you've usually got names. The question is emails. If you found even one real email at the company, read the format right off it: first.last, firstlast, flast. Most companies use one pattern, so a single real address tells you how to spell everyone else's. If you got names but no email at all, ask Hunter for the pattern. One cent. Then apply the pattern to every name. Free.
Last stage: SMTP-verify every generated email before it counts. I use Bouncer at a cent per check (MillionVerifier is even cheaper). This step is a requirement, not a suggestion. Unverified emails bounce, and bounces tank your sending domains. Bounces get dropped. What survives is a list of verified, real, in-the-inbox-tomorrow contacts.
For the hard targets where everything came back empty, there's one more fallback: a Google search scrape that surfaces the people the earlier stages missed. Slower and a couple cents more, but it catches the stragglers.
You don't have to build any of this. I packaged the whole waterfall as a Claude Code skill:
Drop in your API keys, point it at the spreadsheet from playbook one, get back verified contacts. The full write-up with a live run and per-stage costs is at chris-as-is.com/projects/three-cent-contacts, and the code is on GitHub.
Playbook three: run the outbound
You have a list and verified emails. Here's the cold email setup I'd run with it. If I could only keep one channel for the rest of my career, it would be cold email. It's fast, it's cheap, and it creates pipeline. You need three things: prospect emails, mailboxes to send from, and a tool to send with.
The emails you already have. That was playbooks one and two, at under 5 cents per contact. (If you skipped ahead, you can also just buy a list. Apollo, Hunter, Clay and others sell contacts for 10 to 15 cents each. Fine to start, but everyone else bought the same rows.)
Get mailboxes. Start with 10 to 20, depending on budget. They cannot be on your real domain. Cold email puts a domain's reputation at risk, and you don't gamble with the domain your business runs on. If you're fishsticks.com, buy alternate domains like getfishsticks.com and tryfishsticks.com and create the mailboxes there. You can set them up yourself in Google Workspace (admin.google.com) or buy them through your sending tool.
Get a sending tool. Smartlead, Instantly, Lemlist, Woodpecker, they all work and they're all under $100 a month. Connect every mailbox. You'll have to log into each one to authenticate, it's a slog, just do it. Then set the mailboxes to warm up for two weeks minimum. Warmup is the tool quietly sending and answering mail between real inboxes so Google learns you're legit. Don't skip it and don't rush it.
Congratulations, that's 90% of the work, and none of it was writing. The copy is on you, and you're better equipped for it than you think, because the whole signal chapter was secretly a copywriting lesson: you already know what each segment cares about.
The sending rules, and these aren't suggestions:
Don't track opens. Don't track clicks. No links, no images. All three hurt deliverability.
One email to everyone on the list, once a month. One touch, no fancy sequences. Start at 10 to 15 emails a day and raise it after a week or two.
Judge yourself on reply rate and nothing else. Half a percent is acceptable. One percent is great. Out-of-office replies don't count.
Hi {{first_name}},
Saw {{course_name}} is a {{course_type}} course. For courses like yours, empty weekday afternoons are the main thing to fix.
We help {{course_type}} courses on {{booking_system}} fill those slots automatically. Want the 90-second video?
No links, no images, no tracking. Plain text and the fields the scrape filled. That's the whole trick.
This setup gets you 80% of the performance available, out of the gate. There are upgrades, private sending infrastructure, warmup pools, fancier syntax, but you don't need any of them to get results. There's no reason you can't touch every prospect in your market, every single month.
Quick Start: a scraping machine in five minutes
Three pastes and you have everything this book uses.
1. Install Claude Code.
Open the Terminal app and paste:
2. Sign in.
Type claude and hit enter. It walks you through logging in with your regular Claude account. No API keys, no configuration.
3. Install the scraping tools.
You're now talking to Claude. Paste this and let it do the work:
That one paste gives Claude a real browser it can drive (Playwright, for the pages that build themselves with JavaScript) and a stealth browser (CloakBrowser, for the pages that act differently when they smell a bot). You won't need either on day one, plain page grabs cover most of the web, but now they're sitting there for the day a site fights back. And no, you don't need to install anything else by hand. If something's missing, Claude installs it and tells you.
That's the whole setup. From here you just talk. Paste in a URL, describe the spreadsheet you want, and go. The first walkthrough is the best first run, and the three playbooks chain it into the machine: find the companies, find the contacts, run the outbound.
If you run your first scrape and it works, tell someone. Better, tell me: @chris_as_is. First-scrape screenshots make my day.
Do it yourself, or don't
That's the playbook. No held-back chapter, no $199 "real" version. Everything you need to build this for your market is on these pages, and plenty of readers will do exactly that. If that's you, go. Send me what you build, I genuinely want to see it.
If you'd rather not run it yourself, that's the other door. Your understanding of your market can't be outsourced, but the scraping can. It's just work, and it's the work I do all day. You bring the knowing, I'll build the machine.
Either way, you now know the thing the data brokers hoped you wouldn't figure out: it was all public the whole time.
Last verified: June 2026. Scraping breaks. When a play stops working, I fix the page, that's the point of a living playbook.