Web Scraping/Crawling
Web Scraping/Crawling is viaSocket’s built-in plug that lets you pull data from websites directly into your workflows.
Use it to scrape specific details like product prices or job listings, crawl entire sites for bulk data, keep your information updated automatically, and feed it into any app—no copy-pasting needed.
What’s the difference?
Web Scraping → Extracts data from a single, specified URL.
Web Crawling → explores and collects data from multiple pages by following links.
When to use it
Use this step when you want to:
Track product prices or availability on e-commerce sites.
Collect news headlines or articles from blogs and publishers.
Monitor competitor listings or offers.
Gather contact details, job postings, or any structured information.
How to add web scraping/crawling steps
1. Add the action step
In the workflow editor, click + to add an action step.
From Built-in Plugs, select Web Scraping/Crawling.
Choose one option:
Web Scraping → Scrape data from a specific URL.Web Crawling → Crawl links starting from a URL.
2. If you select web scraping

• Enter URL → Type the exact web address (e.g., https://en.wikipedia.org/wiki/Workflow).
Optional Fields:
Include Tags → Add HTML tags or CSS classes to scrape (e.g., h1, p, .main-content).
Main Content Only →
Yes: Scrape only key content (no ads/sidebars).
No: Scrape the full page.
Select Proxy Type →
Basic: Standard proxy.
Stealth: Advanced, less detectable.
Auto: Automatically managed.
Custom Input: Enter your own proxy details.
3. If you select web crawling
_compressed.png)
• Starting URL → Enter the URL where crawling begins.
Optional Fields:
Set Prompt → Give instructions (e.g., “Extract all page titles”).
Limit Pages → Define max number of pages (e.g., 100).
Allow External Links →
Yes: Crawl outside the domain.
No: Stay within domain.
Allow Subdomains →
Yes: Crawl subdomains.
No: Stay on the main domain only.
4. Test your step
Run a final test to confirm the step works.
Continue building the rest of your workflow.
Example Use Cases
Scrape product prices from an e-commerce site and send updates to your spreadsheet.
Crawl a job board and collect all new postings for your industry.
Scrape headlines from a news site and send them to Slack for quick updates.
Crawl competitor websites to monitor new product launches or feature updates.