The product launch that didn't make sense
I have been in email verification for over a decade. Before proxy25, I ran an email verification product. We were good at what we did — or so we thought. We ran head-to-head tests against competitors and came out ahead on valid detection rate. We launched with confidence.
Six weeks later, a customer came back with a bounce rate of 14 percent on a list our tool had classified as 94 percent valid.
We looked at the code. Nothing was wrong. We ran the list again. Similar output. We ran it through a competitor. Similar output. We spent two weeks inside the verification logic looking for a bug that did not exist.
The problem was not in the code. It was in how we had thought about what we were building. We had built a verification tool. What we needed to build was a verification system — four distinct layers, each with its own requirements, each dependent on the one above it being solid before it can return anything trustworthy.
We had built layer three before we fully understood what layers one, two, and four were supposed to do. And we paid for it in exactly the way you would expect: with customer complaints we could not diagnose and accuracy numbers we could not defend.
That experience — and the two years of fixing it properly — is what eventually led to proxy25. This is the complete picture of how to build it correctly.
The Four Layers — And Why the Sequence Is Not Optional
Most teams think of email verification as one thing. It is four things, and they are not interchangeable.
Build them in this sequence. Layer three built before layer two means you are burning SMTP capacity on domains that could have been eliminated in milliseconds with a DNS lookup. Layer four built without understanding the failure modes of layer three means you are presenting false confidence to customers who will discover the inaccuracy at the worst possible moment — when their campaign is live.
The Output Classification Mistake That Costs You Customers
Here is the question that reveals whether a verification system was built with real understanding or assembled from documentation: what exactly does "valid" mean in your output?
Most tools return four categories. The problem is not the categories. The problem is that these four categories carry fundamentally different levels of confidence — and almost every tool presents them with identical certainty.
"Valid" is your least reliable output category, not your most reliable one. Almost nobody says this clearly. Your output classification should reflect the confidence level behind the category — not just the category itself.
You Are Measuring Accuracy Wrong
For the first two years of running the verification product, we measured accuracy the standard way: verify a list, send to it, measure the bounce rate, back-calculate what percentage of our "valid" results were actually deliverable. When the number looked bad, we went into the code.
Back-calculating accuracy from bounce rate conflates two completely different failure modes and presents them as one number. We spent months fixing logic that did not need fixing — because the infrastructure was feeding it bad inputs and we had no way to see the difference.
The measurement that actually diagnoses which problem you have is tracking two numbers separately:
Track them separately. The behaviour of each tells you something the other cannot. Conflating them into a single accuracy number means you keep applying the right fix to the wrong layer.
The Build vs Buy Decision Most Teams Get Backwards
When we were deep in fixing the verification product, we considered building our own IP infrastructure. The logic seemed sound — own the infrastructure, control the costs, remove the vendor dependency.
We did not do it. And the reason we did not do it is the same reason I would tell any team today not to do it.
IP reputation is not something you build on a product timeline. The IPs that mail servers trust for SMTP verification have years of consistent, clean history behind them. Not months. Years. An IP that has been sending clean, well-paced, legitimate traffic for three years carries trust that an IP activated six months ago simply does not have — regardless of how cleanly the newer IP has behaved since activation.
You cannot build that history. You can only accumulate it. Which means the infrastructure you build today will not perform at the level you need for another two to three years — while your product is live, your customers are using it, and your accuracy numbers are being evaluated against competitors whose infrastructure was built years before yours.
That is the right division of labour. It took me two years and more customer complaints than I would like to remember to understand why.
Key Takeaways
The four layers — format, DNS, SMTP, output classification — are not interchangeable. Build them in sequence or each layer below inherits the failures of the one above it.
"Valid" is your least reliable output category, not your most reliable one. A 250 from a clean IP and a 250 from a flagged IP look identical in your output. They are not the same result.
Invalid is the result you can trust most. A genuine 550 holds up across almost all infrastructure conditions.
High unknown rate is a signal about your infrastructure, not your data. It is unresolved, not unresolvable.
Track false positive rate and unknown rate separately. They diagnose different problems. Conflating them into a single accuracy number means you keep applying the right fix to the wrong layer.
IP reputation takes years to build. Source infrastructure from providers who have already built it. Build the logic yourself. That is the right division of labour.
Built for teams that want honest answers
proxy25 residential rotating proxies with port 25 access are purpose-built for teams that want their verification logic to get honest answers from mail servers.