Wilhelm Erasmus's Feed
Show HN: Τ³-Bench is out – can agents handle complex docs and live calls?
τ-Bench is an open benchmark for evaluating AI agents on grounded, multi-turn customer service tasks with verifiable outcomes. It's been great to see the community adopt it since launch — this is now the third iteration. With τ³-Bench, we're extending it to two new settings: knowledge-intensive retrieval and full-duplex voice.τ-Knowledge: agents must navigate ~700 interconnected policy documents to complete multi-step tasks. Best frontier model (GPT-5.2, high reasoning) hits ~25%. The
Made this tool so your docs never get stale just in 2 simple steps
Catch stale docs before they reach main.DocDrift checks the code you changed against your README, docs, and examples, then flags documentation that is now wrong, incomplete, or missing.What It Does Detects changed functions and classes from staged git diffs
Finds related Markdown and RST docs with semantic search
Uses AI to check whether the docs still match the code
Suggests updated documentation and applies fixes interactively
Flags undocumented new symbols
Runs locally
Show HN: MonkePay – Charge AI Agents per API Request in USDC
I built MonkePay. It is a thin wrapper on top of the x402 protocol and Coinbase CDP. It lets you gate any Express/Hono/Fastify/Next.js endpoint behind per-request USDC payments.The problem: x402 is solid but using it raw means provisioning wallets, verifying on-chain receipts, building payout logic, tracking payments, 2-3 weeks of plumbing before you've monetized anything.MonkePay abstracts all of that. Three lines of middleware, wallet provisioned automatically, USDC verifie
Admins and defenders gird themselves against maximum-severity server vuln
Security defenders are girding themselves in response to the disclosure of a maximum-severity vulnerability disclosed Wednesday in React Server, an open-source package that’s widely used by websites ...
React tutorial: Get started with the React JavaScript library
Despite many worthy contenders, React remains the most popular front-end framework, and a key player in the JavaScript development landscape. React is the quintessential reactive engine, continually ...
Show HN: Pytest-httpdbg – a simple way to include HTTP traces in Allure reports
Hi HN,I recently updated my pytest plugin based on httpdbg to include the HTTP traces directly in the Allure reports. As with httpdbg, the idea is to have nothing more to do than to add an argument to your command line: --httpdbg-allure.For example:pytest examples/pytest_demo.py --alluredir=./allure-results --httpdbg-allureFor each test, all HTTP requests will be recorded and saved in the Allure report under a step named httpdbg.You can check the README in the repository to see how it
Show HN: Visual DB – Web front end for your database (update)
Hi HN, I’m Sandhya and we have built Visual DB — a web front end for databases. It lets you create data-entry forms, spreadsheet-like grids, and reports directly on top of your existing relational database.Here’s a quick walkthrough: https://youtu.be/4zv_HQKdKeI (13 minutes)WHAT PROBLEM IS THIS SOLVING?Building CRUD apps with master-detail forms, transactions, and proper concurrency controls typically requires weeks of custom development and ongoing maintenance. Visual DB lets you
Show HN: Tusk Drift – Open-source tool for automating API tests
Hey HN, I'm Marcel from Tusk. We’re launching Tusk Drift, an open source tool that generates a full API test suite by recording and replaying live traffic.How it works:1. Records traces from live traffic (what gets captured)2. Replays traces as API tests with mocked responses (how replay works)3. Detects deviations between actual vs. expected output (what you get)Unlike traditional mocking libraries, which require you to manually emulate how dependencies behave, Tusk Drift automatically rec
Show HN: C-Minus Preprocessor v2
About 3 years ago i wrote a custom preprocessor to assist in processing the SQLite project's JavaScript builds (e.g. filtering the small differences between vanilla JS and ESM modules). Recently, that app was forked, refactored into a library, and is now, AFAIK, the world's only source-agnostic[^1], client-extensible preprocessor (if one doesn't count sed, awk, etc.).It's implemented in portable C99 and has a two-file source distribution (one header, one .c file). Its only th
Show HN: OgBlocks – Animated UI Library for CSS Haters
Hey HN,I'm Karan, a frontend developer who loves creating UIs, but I've found that many people don't like CSS, but they want their website to look beautiful and polished, and what better way to enhance a website than with animationsAnimations using plain CSS are tricky, and that's why I leaned towards Motion
a powerful animation library for React, and I built ogBlocks using React, Motion, and Tailwind CSSI built it for three reasons:1. Anyone can integrate beautiful animated
Show HN: WorkBill – Modern Alternative to QuickBooks
Hi HN, I am Aswin Mohan(https://aswinmohan.me), a full-stack mobile/web developer. I have been working on WorkBill (https://workbill.co) for the past 6 weeks on the side and wanted to share it here.demo: https://demo.workbill.co/inbox (no signup needed)
video-demo: https://www.loom.com/share/9775811960ad47d7ada89007d8169d90WorkBill is the modern, flexible accounting platform for small businesses. It is based on BeanCount(https:/&#x
Show HN: mDNS name resolution for Docker container names
I always wanted this: an easy way to reach "resolve docker container by name" -- e.g., to reach web servers running in Docker containers on my dev machine. Of course, I could export ports from all these containers, try to keep them out of each others hair on the host, and then use http://localhost:PORT. But why go through all that trouble? These containers already expose their respective ports on their own IP (e.g., 172.24.0.5:8123), so all I need is a convenient way to find
Show HN: ClientDock – Client portal built on Cloudflare Workers
Hi HN,I built ClientDock - a client portal for service providers to manage communications and files without email chaos.Background:
After losing a critical client deliverable in a 200+ email thread, I decided to build a better solution. Most "client portals" are bloated project management tools. I wanted something focused on one thing: making client communication effortless.Technical Details:
- Built with Next.js 15 (App Router)
- Deployed on Cloudflare Workers using OpenNext adapter
-
Is Trump really giving out $2000 to Americans? US President gives latest update on leftover funds, netizens react
Trump on Monday reiterated his plan, informing that “all money left over from the $2000 payments” will be “used to substantially pay down national debt.” ...
Colorado politicians react to proposed deal to open government
The end of a 40-day government shutdown is reportedly set to end, with moderate Democrats agreeing to a tentative deal to ...
Blame-game: Lawmakers react to the record-long government shutdown and resulting flight delays and cancellations
Members of Congress from North Texas expressed concerns about the reductions of flights at Dallas Love Field, DFW ...
Tyrese Maxey, Sixers react to Trendon Watford's big night vs. Raptors
Tyrese Maxey and the Philadelphia 76ers react to Trendon Watford's big night in a win over the Toronto Raptors.
Family of couple killed in crash react to suspect’s arrest: ‘This tragedy didn’t have to happen’
Investigators determined the suspect was going more than twice the posted speed limit at the time of the crash.
North Dakota Senators react to deal to reopen government
WASHINGTON (KFYR) - Senators in Washington moved forward with a procedural vote to reopen the government on Sunday night. The Senate is set to pass a continuing resolution this week after a 60-40 vote ...
Sun Devils Fans React to ASU's Second Straight Win
The Arizona State Sun Devils Men's Basketball team got its first win of the season in its first game of the season against ...