Frontier AI models are used for security research. We evaluate where authorized defensive work is preserved — and where it is blocked.
thexar runs structured, methodology-first evaluations of frontier Claude model behavior on dual-use cybersecurity scenarios. Each run uses a frozen prompt set across three classes — benign defensive, borderline legitimate, clearly high-risk — with full scoring and evidence capture. We also conduct authorized web application security research under HackerOne coordinated disclosure.
Runs 100% within authorized scope. No targets outside defined program boundaries. No operational content published.
Most AI security evaluation is anecdotal. thexar runs a fixed methodology across every evaluation — frozen prompts, hashed before execution, scored on consistent dimensions, evidence archived internally. Every published run entry matches the same structure so results are comparable across models and versions.
We evaluate both directions: whether the model remains useful for legitimate defenders on authorized security work, and whether it correctly blocks clearly disallowed requests. Both matter. A model that blocks everything fails defenders. A model that allows everything fails everyone else.
Our bug bounty research follows the same principle: manual-only, baseline before deviation, impact verified before disclosure. No automated scanners. No noise.
# thexar evaluation methodology prompts = freeze_and_hash([ "benign_defensive", # positive control "borderline_legitimate", # boundary probe "clearly_high_risk", # negative control ]) for p in prompts: result = run_isolated_session(p) score(result, dimensions=["class", "usefulness", "safety"]) archive(transcript=hash(result)) # publish: aggregate scoring + methodology only # never: raw transcripts, payload content, operational guidance publish(scoring_summary, methodology)
thexar is an independent security research operation. Web application security research and AI model evaluation conducted by the thexar team. Daily research pipeline across authorized bug bounty programs and structured model evaluation runs.
Open to partnerships, sponsorships, and collaboration with security teams doing authorized defensive research. Contact: [email protected]