I Got Tired of n8n Workflows That Silently Broke

I use n8n for lead generation. Domain lookups, contact enrichment, personalized outreach, pipe it all into a Google Sheet I use as a CRM. I've built some variation of this pipeline probably a dozen times.

The building part was never the real problem. The real problem was that workflows would deploy fine, return HTTP 200, and then quietly do nothing useful. A Hunter node would get a field name wrong and return zero contacts. A Code node would time out after 60 seconds because I was making LLM calls sequentially instead of in parallel. The Google Sheet would end up with 47 rows where 3 of them had actual email addresses and the rest were blanks.

I wouldn't find out until I opened the sheet hours later and realized the data wasn't there.

That's why I built N8N Workflow Creator. Not because I wanted another way to generate n8n nodes. I wanted something that would check whether the workflow actually did what I asked it to do.

What it actually is

N8N Workflow Creator is a set of 8 skills you install into your AI coding agent. It works with Claude Code, GitHub Copilot CLI, Cursor, Codex, OpenCode, and a few others. You describe what you want in plain English, the skills auto-activate based on your prompt, and the agent builds the entire workflow through n8n's REST API.

No browser. No dragging nodes around. You describe a pipeline and it gets built, deployed, tested, and debugged programmatically.

The part that matters to me: the testing skill doesn't just check that the workflow ran. It checks every node's output, counts the items, verifies the fields exist, and confirms that data actually reached the destination. If something comes back empty, the debugging skill traces the data flow backward through every node until it finds where things went wrong.

The stack I keep reaching for

I've settled on a stack for GTM lead generation that keeps showing up in every pipeline I build. Here's what each piece does and why I picked it.

Exa AI for finding real people

The first problem in any outreach campaign is finding people who actually care about what you're building. Not scraping a generic list. Finding humans in specific communities who are already talking about your space.

Exa's neural search is good at this. I've been using it to find Instagram profiles for athletes and coaches in niche fitness communities, but the pattern works for any vertical. You give it a search query like "HYROX athlete training race competition results" scoped to instagram.com, and it returns URLs to actual posts and profiles.

The important part: you're extracting handles from real Instagram URLs, not asking an LLM to guess someone's username. About 50-60% of Exa results are post URLs where you can parse the handle out of the path. The rest are profile URLs. Either way, you're working with verified data.

A typical discovery campaign runs 12 queries with 25 results each. That's about 300 raw results for $0.06. After deduplication and handle extraction, you end up with 80-120 unique profiles. Not a massive list, but a real one.

Hunter.io for email discovery and verification

Once you know who you want to reach, Hunter finds their email addresses and tells you whether they're actually deliverable. The domain search endpoint returns contacts along with verification status, so you don't burn a separate API call to check each one.

I filter hard at this stage. Only valid and accept_all statuses get through. Everything marked invalid, unknown, disposable, or webmail gets dropped. For B2B outreach, webmail addresses (Gmail, Yahoo) are almost never the right contact.

The skill also scores contacts on a 15-point scale based on title, seniority, and department. A VP of Marketing at a target company scores higher than a generic "team" email. The top-scoring contact becomes the To field, and the next two become CC. This isn't sophisticated, but it means I'm not emailing the intern.

OpenRouter for cold emails that don't sound like templates

Every cold email I've ever received from an AI-powered tool reads the same way. "I'd love to explore a transformative partnership that leverages synergies between our vibrant brands." Nobody reads past the first sentence.

The email outreach skill runs Claude through OpenRouter with 12 specific anti-AI rules baked into the prompt. No sycophantic openers. No significance inflation. No rule-of-three lists. No "delve" or "landscape" or "foster." The emails come out shorter and more direct because the prompt actively penalizes the patterns that make cold emails sound generated.

There's an optional second pass where the LLM audits its own output for remaining AI tells and rewrites them. It adds maybe 30% improvement in naturalness and doubles the LLM cost, which at OpenRouter prices means going from fractions of a cent to slightly larger fractions of a cent.

The real constraint here is the 60-second timeout on n8n Cloud Code nodes. If you're generating emails for 24 contacts sequentially, each call taking 3-5 seconds, you blow past the limit and the node dies. The skill batches calls 12 at a time using Promise.all with a 1-second delay between batches. 24 contacts finish in about 10 seconds instead of 120.

One gotcha that cost me a few hours: arrow functions inside Promise.all lose the this binding in n8n Code nodes. You have to pass this as a parameter to the async function. The skill handles this, but it's the kind of thing that fails silently and produces undefined outputs with no error message.

Google Sheets as a lightweight CRM

I use Google Sheets as my CRM. I know there are better tools. But for early-stage outreach where I'm testing messages and audiences, a spreadsheet with columns for email, CC, subject, body, name, position, and brand is all I need. The format is compatible with YAMM and Mailmeteor, so I can go from sheet to sent emails in about two clicks.

The Sheets integration handles appending rows, batch updates matched by key column, and rate limiting at 30 rows per batch with 3-second delays to stay under Google's API limits. For larger datasets (50+ contacts), the skill generates data externally in Python and pushes it through a webhook in batches.

The cleanup patterns are just as important as the write patterns. I've had sheets fill up with garbage rows where the enrichment step failed but the append step ran anyway. The skill now validates that email and brand name exist before writing anything.

The testing problem nobody talks about

Here's what I learned the hard way: HTTP 200 from an n8n webhook means n8n received your request. It does not mean the workflow executed. It definitely does not mean the workflow succeeded. And it absolutely does not mean data reached your Google Sheet.

The testing skill works like this:

Record all existing execution IDs (the baseline)
Trigger the workflow
Poll the executions API every 3 seconds until a new execution appears
Wait for that execution to finish (polling every 5 seconds, up to 5 minutes)
Walk through every node in the execution data and check: did it produce items? Did it succeed? Do the output fields look right?
Check the final destination node specifically: did rows actually get written?

If anything is wrong, it tells you which node failed, how many items it produced (usually zero), and what fields were in the output. That last part is critical because the number one cause of silent data loss in n8n is field name mismatches.

Field mismatches: the silent killer

This is the bug I hit more than any other. Node A outputs a field called domain. Node B expects a field called brand_domain. n8n doesn't throw an error. It just passes the item through with an empty value for brand_domain, and every downstream node that depends on it produces garbage or nothing.

The debugging skill traces this by printing the actual field names at each node in the execution data:

for name, runs in run_data.items():
    data = runs[0].get("data", {}).get("main", [[]])[0]
    if data:
        print(f"{name}: {list(data[0]['json'].keys())}")

When you see Format Results: ['domain', 'email', 'name'] followed by Hunter Search: [], you know the Hunter node couldn't find the brand_domain field it was looking for. The fix is usually one line in an expression, but finding it without this trace can take an hour of clicking through the n8n UI.

How the skills chain together

You don't invoke these skills one at a time. You describe what you want and they activate based on context.

Say you type: "Build me a workflow that takes company domains, finds contacts via Hunter, generates personalized cold emails, and writes everything to a Google Sheet for mail merge."

The full pipeline skill kicks in first. It checks that your n8n instance is reachable, finds your credentials (Hunter API key, Google Sheets OAuth, OpenRouter key), and plans the node sequence. Then the workflow building skill creates the nodes and wires the connections through the REST API. The API patterns skill handles the Code node logic for LLM calls with proper batching. The email outreach skill structures the contact ranking and email generation. The Sheets skill configures the output node with the right column format.

After deployment, the testing skill triggers the workflow and validates every node's output. If something breaks, the debugging skill takes over and traces the data flow.

The whole thing takes a few minutes from description to verified, working pipeline. I used to spend an afternoon on this.

The connection wiring that trips people up

Building nodes programmatically through the REST API means you're writing connection objects by hand. The format isn't complicated, but there are a few things that will cost you time if you don't know about them.

A linear connection (Node A feeds Node B) looks like this:

connections["Source Node"] = {
    "main": [[{"node": "Target Node", "type": "main", "index": 0}]]
}

A fan-out (one node feeds two downstream nodes) looks like this:

connections["Format Results"] = {
    "main": [[
        {"node": "Add to Sheets", "type": "main", "index": 0},
        {"node": "Select Priority Contacts", "type": "main", "index": 0}
    ]]
}

The mistake I kept making: after updating a workflow via PUT, you have to cycle the activation. Deactivate, wait 3 seconds, activate, wait 5 seconds. If you skip this, webhook URLs don't update and your triggers point at the old workflow version. The 5-second wait isn't arbitrary. Webhook registration takes time to propagate, especially on n8n Cloud.

Another one: the PUT endpoint for updating a workflow only wants name, nodes, connections, settings, and staticData in the body. If you include id, tags, or active, it can silently ignore your changes or throw confusing errors.

Who this is for

I built this for GTM engineers who are wiring up lead generation pipelines in n8n by hand. If you're spending your afternoons dragging Hunter nodes and Code nodes around the UI, rebuilding the same enrichment-to-outreach-to-CRM pattern for different campaigns, this does that work for you.

It's also for anyone who's been burned by silent failures in n8n workflows. If you've ever deployed a workflow, assumed it was working, and found an empty Google Sheet the next morning, the testing and debugging skills exist because of that exact experience.

Getting started

The project is open source under MIT. One install command for most platforms:

Platform	Command
Claude Code	`/plugin install BandaruDheeraj/N8N-Workflow-Creator`
GitHub Copilot CLI	`copilot plugin install BandaruDheeraj/N8N-Workflow-Creator`
Cursor	`/plugin-add n8n-workflow-creator`

For Codex, OpenCode, and manual installs, the full instructions are in the repo.

You'll need a running n8n instance (Cloud or self-hosted) with an API key, and credentials configured for whatever services your pipeline uses (Hunter, OpenRouter, Google Sheets, Exa).

The repo also works alongside czlonkowski/n8n-skills if you're already using that. They teach the agent how to configure n8n nodes correctly. This project teaches the agent how to build, deploy, test, and debug complete workflows. The two complement each other.

github.com/BandaruDheeraj/N8N-Workflow-Creator