Wilhelm Erasmus's Feed
Show HN: Τ³-Bench is out – can agents handle complex docs and live calls?
τ-Bench is an open benchmark for evaluating AI agents on grounded, multi-turn customer service tasks with verifiable outcomes. It's been great to see the community adopt it since launch — this is now the third iteration. With τ³-Bench, we're extending it to two new settings: knowledge-intensive retrieval and full-duplex voice.τ-Knowledge: agents must navigate ~700 interconnected policy documents to complete multi-step tasks. Best frontier model (GPT-5.2, high reasoning) hits ~25%. The
Show HN: MonkePay – Charge AI Agents per API Request in USDC
I built MonkePay. It is a thin wrapper on top of the x402 protocol and Coinbase CDP. It lets you gate any Express/Hono/Fastify/Next.js endpoint behind per-request USDC payments.The problem: x402 is solid but using it raw means provisioning wallets, verifying on-chain receipts, building payout logic, tracking payments, 2-3 weeks of plumbing before you've monetized anything.MonkePay abstracts all of that. Three lines of middleware, wallet provisioned automatically, USDC verifie
Made this tool so your docs never get stale just in 2 simple steps
Catch stale docs before they reach main.DocDrift checks the code you changed against your README, docs, and examples, then flags documentation that is now wrong, incomplete, or missing.What It Does Detects changed functions and classes from staged git diffs
Finds related Markdown and RST docs with semantic search
Uses AI to check whether the docs still match the code
Suggests updated documentation and applies fixes interactively
Flags undocumented new symbols
Runs locally
Admins and defenders gird themselves against maximum-severity server vuln
Security defenders are girding themselves in response to the disclosure of a maximum-severity vulnerability disclosed Wednesday in React Server, an open-source package that’s widely used by websites ...
React tutorial: Get started with the React JavaScript library
Despite many worthy contenders, React remains the most popular front-end framework, and a key player in the JavaScript development landscape. React is the quintessential reactive engine, continually ...
macOS 26 breaks custom DNS settings including .internal
One of those 'woke up to MacOS updates' and finding none of my dockers are reachable via dnsmasq (which I use), and low and behold, an update silently breaks custom dns resolution. Hopefully Apple will listen to the bug report I've made. Hold off on updating if you use this…
Show HN: MDX Docs – a lightweight React framework for documentation sites
Hey HN! I’m Ezra, the creator of MDX Docs.I built this because I wanted a fast, simple way to document components using Markdown and React together with MDX.The goal was to keep things really straightforward: pages are just MDX files, and they map directly to routes. You can write docs and drop in React components right alongside them without much setup.It also includes a CLI:npx create-mdx-docs@latest my-docsI’ve been using it to spin up docs sites quickly, and it’s been a really nice workflow
Show HN: Three new Kitten TTS models – smallest less than 25MB
Kitten TTS (https://github.com/KittenML/KittenTTS) is an open-source series of tiny and expressive text-to-speech models for on-device applications. We had a thread last year here: https://news.ycombinator.com/item?id=44807868.Today we're releasing three new models with 80M, 40M and 14M parameters.The largest model (80M) has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. Thi
Preventing agent drift: A guide to shipping serious code via vibe-coding
I have been experimenting with agents for large codebases in Java and here's my opinion. Integrating and maxxing out rule based frameworks such as 100% unit test coverage, mutation test coverage, 0% sonar errors, using openrewrite as a default skill for refactoring. Check out this blog that I wrote on this : https://www.jaipilot.com/blog/preventing-agent-drift-serious-code-vibe-coding
Show HN: React isn't the terminal UI bottleneck, the output pipeline is
Anthropic rewrote Claude Code's terminal renderer and found that React wasn't the problem. Ink's line-level rewriting was. I built their approach into a standalone library.CellState uses a custom React reconciler that renders directly to a cell grid and diffs frame-by-frame at the cell level. You keep native terminal behavior (scrolling, text selection, Cmd+F) because it runs inline instead of alternate screen.React's reconciler only touches the subtree that changed, and the
Launch HN: Voltair (YC W26) – Drone and charging network for power utilities
Hey HN! We’re Hayden, Ronan, Avi, and Warren of Voltair (https://voltairlabs.com/). We’re making weatherized, hybrid-fixed drones deployed for power utility inspections.Here’s some footage: https://vimeo.com/1173862237/ac28095cc6?share=copy&fl=sv&fe=... and a photo of our latest prototype: https://imgur.com/a/bYHnqZ4.The U.S. has 7M miles of power lines (enough to go to the moon and back 14 times), and they're aging. Over 50% of
Show HN: Built a zero config proxy that lets Claude control your React App
I built a proxy to let Claude (or any other agent) take control of your react application. I basically built it out of my own need and as an experiment to learn one things or two.<p>It sits between the dev server and the browser allowing to intercept, modify and navigate in your app. It can test user journey, backend down services and so on and fix them if your app breaks.<p>I tried to secure it as much as I could and any contribution, feedback is really appreciated! Hope you guys enjoy it
New React bug that can drain all your tokens is impacting 'thousands of' websites
A critical vulnerability in React Server Components is being actively exploited by multiple threat groups, putting thousands of websites — including crypto platforms — at immediate risk with users ...
Polk React review: Built-in Alexa soundbar for your TV
TV and home video editor Ty Pendlebury joined CNET Australia in 2006, and moved to New York City to be a part of CNET in 2011. He tests, reviews and writes about the latest TVs and audio equipment.
Show HN: Visual DB – Web front end for your database (update)
Hi HN, I’m Sandhya and we have built Visual DB — a web front end for databases. It lets you create data-entry forms, spreadsheet-like grids, and reports directly on top of your existing relational database.Here’s a quick walkthrough: https://youtu.be/4zv_HQKdKeI (13 minutes)WHAT PROBLEM IS THIS SOLVING?Building CRUD apps with master-detail forms, transactions, and proper concurrency controls typically requires weeks of custom development and ongoing maintenance. Visual DB lets you
Show HN: Pytest-httpdbg – a simple way to include HTTP traces in Allure reports
Hi HN,I recently updated my pytest plugin based on httpdbg to include the HTTP traces directly in the Allure reports. As with httpdbg, the idea is to have nothing more to do than to add an argument to your command line: --httpdbg-allure.For example:pytest examples/pytest_demo.py --alluredir=./allure-results --httpdbg-allureFor each test, all HTTP requests will be recorded and saved in the Allure report under a step named httpdbg.You can check the README in the repository to see how it
Show HN: Tusk Drift – Open-source tool for automating API tests
Hey HN, I'm Marcel from Tusk. We’re launching Tusk Drift, an open source tool that generates a full API test suite by recording and replaying live traffic.How it works:1. Records traces from live traffic (what gets captured)2. Replays traces as API tests with mocked responses (how replay works)3. Detects deviations between actual vs. expected output (what you get)Unlike traditional mocking libraries, which require you to manually emulate how dependencies behave, Tusk Drift automatically rec
Show HN: C-Minus Preprocessor v2
About 3 years ago i wrote a custom preprocessor to assist in processing the SQLite project's JavaScript builds (e.g. filtering the small differences between vanilla JS and ESM modules). Recently, that app was forked, refactored into a library, and is now, AFAIK, the world's only source-agnostic[^1], client-extensible preprocessor (if one doesn't count sed, awk, etc.).It's implemented in portable C99 and has a two-file source distribution (one header, one .c file). Its only th
Show HN: WorkBill – Modern Alternative to QuickBooks
Hi HN, I am Aswin Mohan(https://aswinmohan.me), a full-stack mobile/web developer. I have been working on WorkBill (https://workbill.co) for the past 6 weeks on the side and wanted to share it here.demo: https://demo.workbill.co/inbox (no signup needed)
video-demo: https://www.loom.com/share/9775811960ad47d7ada89007d8169d90WorkBill is the modern, flexible accounting platform for small businesses. It is based on BeanCount(https:/&#x
Show HN: OgBlocks – Animated UI Library for CSS Haters
Hey HN,I'm Karan, a frontend developer who loves creating UIs, but I've found that many people don't like CSS, but they want their website to look beautiful and polished, and what better way to enhance a website than with animationsAnimations using plain CSS are tricky, and that's why I leaned towards Motion
a powerful animation library for React, and I built ogBlocks using React, Motion, and Tailwind CSSI built it for three reasons:1. Anyone can integrate beautiful animated