Disclaimer: this project is intended exclusively for authorized testing environments, defensive security research, and educational purposes. It is not designed to facilitate real fraud or unauthorized activity.
When a security team wants to assess how robust their fraud detection system actually is, they usually have two options. Run periodic audits. Or wait for something to go wrong. Neither of them is particularly actionable.
The Fraud Evasion Penetration Testing Tool was built to fill that gap in a practical way. The goal is to give security assessors a way to simulate realistic evasion attempts against a Random Forest based fraud detection model, read a bypass probability score, and get a report that explains which controls appear vulnerable and why.
The project ended up being rich enough to deserve a place in a portfolio. It brings together threat modeling, applied machine learning, software architecture, and web application hardening inside one coherent product.
What happens when you stress test a fraud detection system
Modern fraud detection systems work on signals. Anomalous geographic distance between two transactions, physically impossible travel velocity, inconsistency between expected and observed behavioral patterns, chip and PIN presence or absence, merchant history.
Each of these signals individually is manageable. The real problem surfaces when you test how the system reacts to combinations that are deliberately crafted to appear legitimate. That is the angle from which this tool was designed.
The main attack vectors modeled in the project are:
- geolocation spoofing and distance anomalies
- velocity checks and impossible travel scenarios
- card configuration: chip, PIN, expected behavior
- retailer history as a contextual signal
- behavioral baselining as a second layer control
Each parameter has a recognizable meaning in real fraud detection systems. Using them together as tunable test levers makes the tool considerably more useful than a generic risk slider dashboard.
The output that actually matters in an assessment
The Random Forest model calculates a Success Score, which is the estimated probability that a transaction with specific characteristics will pass through the controls undetected. But a number by itself is not very useful.
What was built on top of the model is an explainability layer. The vulnerability report that accompanies every simulation tells you which controls fired, which signals appear inconsistent, and where the detection logic shows gaps relative to the tested configuration. That type of output is what ends up in risk team presentations, resilience evaluations, and formal reports.
The optional MaxMind GeoLite2 integration adds another layer of realism. Instead of entering distances manually, real IP addresses can be resolved into locations and geographic anomalies can be calculated automatically. Keeping it optional avoids mandatory dependencies, but when enabled it makes test scenarios considerably more precise.
Why the architecture is part of the project, not a footnote
From day one it was clear the project needed to work on two distinct levels. One for visibility. One for actual use.
The backend edition is the serious version. It is built with Flask, organized with an application factory pattern, split into routes, a service layer, validation modules, and security components. The model is loaded as an artifact trained offline through a dedicated script. Server side input validation is strict. Configurations are separated by environment. GeoLite2 integration, when present, is handled locally.
The frontend demo edition is designed for public visibility. It is published as a static site on GitHub Pages and reproduces the same product experience in the browser without requiring a Python backend. Calculations use demonstration logic and static JSON data shared between the two editions. A clear notice informs visitors that they are interacting with a frontend educational demo.
The model and the training logic
The choice of Random Forest for this task reflects practical reasoning. It handles tabular data with heterogeneous features well. It works naturally with mixed signal types such as booleans, continuous numerics, and categoricals without requiring rigid normalization. It produces probabilistic outputs that can be explained.
The training flow is deliberately separated from the application. A dedicated script prepares the dataset, trains the model, and produces a serialized artifact fraud_model.pkl. The application then loads that artifact and uses it during inference. That separation between training and serving is intentional because merging the two processes makes the system brittle and difficult to update independently.
In a production context that boundary becomes even more important. The model can be versioned, replaced, and retrained on new data while the application remains stable. That structure reflects a way of thinking about how ML systems are supposed to live over time.
Hardening: you cannot build a security tool with an insecure web app
A project that talks about evasion and assessment has to be consistent about its own attack surface. For that reason the backend edition includes a hardening plan designed as a minimum baseline for any exposed Flask application.
The main areas covered are:
Validation and input safety. Every POST endpoint has strict server side validation with type checks, range checks, normalization, and rejection of malformed or oversized payloads. No client supplied data is trusted.
Security headers. Content-Security-Policy, X-Content-Type-Options, X-Frame-Options, Referrer-Policy, and Strict-Transport-Security are configured as defaults, not as afterthoughts.
CSRF and request safety. State changing forms use CSRF protection. Sensitive routes accept only POST.
Rate limiting. Scoring and reporting endpoints have frequency limits to prevent automated abuse and scraping.
Privacy aware logging. Structured logs mask IP addresses and sensitive inputs. No unnecessary plaintext data in log output.
Secret management. No keys in code. Everything through environment variables with a documented .env.example.
Error handling. In production no stack traces are exposed. Only structured error responses that reveal nothing about the implementation.
Minimized surface. No unnecessary features. The GeoLite2 database is local and not exposed. The public demo contains no secrets.
What this means for anyone evaluating the profile
This tool was built because the goal was something in the portfolio that would show more than a single skill. A pentest report shows methodology. A notebook shows ML. This project shows how you put together a system that has a real objective, a defensible architecture, attention to application security, and an accessible presentation layer.
It also represents a way of working. Understanding the problem deeply, building something that can be extended, not stopping at the working prototype. The kind of role being pursued is one where security, development, and systems thinking genuinely overlap. This project moves in exactly that direction.
The code is public. The demo is live. The backend is documented. If you want to discuss architecture decisions, model choices, or the hardening plan in detail, reach out.
Live demo is HERE.
Project notes
The frontend demo of this tool is publicly available as an educational and demonstrative version.
The full source code of the backend, including the model training pipeline and the real application logic, is not publicly available for responsible security reasons. Given the nature of the project, which simulates evasion techniques against fraud detection systems, the operational code is kept in controlled environments and used exclusively in authorized testing contexts.
For collaboration requests, code reviews, or technical discussions, feel free to reach out directly.Copyright
All content in this article, including text, structure, analysis, and architectural descriptions, is the intellectual property of the author.
Reproduction in whole or in part without explicit written permission is prohibited. Short quotations are permitted provided they include clear attribution and a link to the original article.
Top comments (0)