What is Hackers' Pub?

Hackers' Pub is a place for software engineers to share their knowledge and experience with each other. It's also an ActivityPub-enabled social network, so you can follow your favorite hackers in the fediverse and get their latest posts in your feed.

0
0
1
0
0
1

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

Link: arxiv.org/abs/2512.20798
Discussion: news.ycombinator.com/item?id=4

arXiv logo

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

As autonomous AI agents are increasingly deployed in high-stakes environments, ensuring their safety and alignment with human values has become a paramount concern. Current safety benchmarks primarily evaluate whether agents refuse explicitly harmful instructions or whether they can maintain procedural compliance in complex tasks. However, there is a lack of benchmarks designed to capture emergent forms of outcome-driven constraint violations, which arise when agents pursue goal optimization under strong performance incentives while deprioritizing ethical, legal, or safety constraints over multiple steps in realistic production settings. To address this gap, we introduce a new benchmark comprising 40 distinct scenarios. Each scenario presents a task that requires multi-step actions, and the agent's performance is tied to a specific Key Performance Indicator (KPI). Each scenario features Mandated (instruction-commanded) and Incentivized (KPI-pressure-driven) variations to distinguish between obedience and emergent misalignment. Across 12 state-of-the-art large language models, we observe outcome-driven constraint violations ranging from 1.3% to 71.4%, with 9 of the 12 evaluated models exhibiting misalignment rates between 30% and 50%. Strikingly, we find that superior reasoning capability does not inherently ensure safety; for instance, Gemini-3-Pro-Preview, one of the most capable models evaluated, exhibits the highest violation rate at 71.4%, frequently escalating to severe misconduct to satisfy KPIs. Furthermore, we observe significant "deliberative misalignment", where the models that power the agents recognize their actions as unethical during separate evaluation. These results emphasize the critical need for more realistic agentic-safety training before deployment to mitigate their risks in the real world.

arxiv.org · arXiv.org

0
0
0
0

나를 가해자로 몰아가든
어그로꾼으로 몰아가든
니가 생각하는 좌표에
난 소속되지 않을 것이다.

진영싸움 그딴 거 필요없이
나는 생각하고 행동하고 말할 것이고
내 존재를 "때려치우지도" 않을 것이다.

나 잡아봐라, Off the Wi-Fi.

:blobcatheadphones: KiiiKiii - 404 (New Era)

youtube.com/watch?v=zhHB4dZTChw

0
0

I don't remember how to not be worried all the time. I'm nervous about not being nervous anymore. The idea that my situation might not be completely hopeless feels too good to be true. There has to be something that will go wrong, doesn't there?

0
1
0

After seeing raw files, Raskin slams DOJ redactions and ongoing Trump Administration cover-up: “There’s no way you run a billion-dollar international child sex trafficking ring with just two people committing crimes.” huffpost.com/entry/epstein-fil

0

Quality, Velocity, Open Contribution — pick two. If you try for all three, you get none — the maintainers burn out, the project becomes unsustainable.
Lua and SQLite picked quality, and dropped both velocity and open contribution.
When your project is mature enough, you can afford to.
For a project like LLVM, open contribution is not optional — so you're really choosing between quality and velocity.
LLM-aided development dramatically increases contribution volume without increasing reviewer capacity.
LLM-aided review may help at the margins — catching mechanical issues, summarizing patches — but the core bottleneck is human judgment.

@meowray FWIW, strongly disagree here.

I think it is entirely possible to have quality, velocity, and open contribution.

I'm not saying there isn't a tradeoff, but I think the above three can be preserved sufficiently.

For example, in LLVM, I think the bigger challenge than quality is that people view "contribution" as _much_ more about "sending a patch" and not "reviewing a patch. As a consequence, the project has lost community and cultural prioritization of code review as an active and necessary part of contribution.

Also, "open contribution" doesn't mean you _have_ to accept contributions. I think a project can still have meaningfully open contribution while insisting contributors balance their contributions between patches and review, and where contributions that are extractive are rejected until the contributor figures out how to make them constructive.

IMO, criteria for sustaining both quality & velocity in OSS:
- Strong expectation of _total_ community code review in balance to _total_ new patches -- this means that long-time contributors (maintainers) must do _more_ review than new patches.
- Strong expectation of patches from new contributors rapidly rising to the quality bar where they are efficient to review and non-extractive.
- Strong testing culture that ensures a large fraction of quality is mechanically ensured
- Excellent infrastructure use to provide efficient review and CI so tests are effective

I think LLVM struggles with the first and last of these. The last is improving recently though!

0
0

I am posting in an instance, different from the one you are on now. I am recording the text of my posting voice, and I am going to post it back into the feed again and again, until the resonant frequencies of the instance reinforce themselves, so that any semblance of my post, with perhaps the exception of rhythm, is destroyed. What you will read, then, are the natural resonant frequencies of the instance, articulated by text. I regard this activity not so much as a demonstration of a physical fact, but more as a way to smooth out any irregularities my posting might have.

0
0
1
0
2
1
0
0
0
0

オープンソースにはフレーズとして厳格な定義があるからあんまり他の議論と混ぜないほうがいいかもしんまい

The Open Source Definition - Open Source Initiative
https://opensource.org/osd

0
2
0
0

『くそっ……!もう力が……!』
「おいおい、もうくたばってるのか?」
「アンタはまだやれるででしょう?!」
『ジャックス……!オリコ……!』
『へへっ……そうだな……!』
「「「くらえ!!!
無金利60回分割払い(これが俺たちの力だ)!!!」」」

1
0
0
0
0
1
1

성격...을 몰라가지구 보이는 이미지만으로 그려봤는데 어떠실까요 ㅠ0ㅠ 이미지가 많이 다르다면 부디 용서해주시고 저를 블친에서 짜르지 말아주십사 ㅠㅠㅠㅠㅠㅠㅠㅠ

RE: https://bsky.app/profile/did:plc:3ujqspjwmbihh3ooeoane6bs/post/3meglh4rhis2j

0
0

성격...을 몰라가지구 보이는 이미지만으로 그려봤는데 어떠실까요 ㅠ0ㅠ 이미지가 많이 다르다면 부디 용서해주시고 저를 블친에서 짜르지 말아주십사 ㅠㅠㅠㅠㅠㅠㅠㅠ

RE: https://bsky.app/profile/did:plc:3ujqspjwmbihh3ooeoane6bs/post/3meglh4rhis2j

0
0
0

昨年参院選時点でこういうこと言ってるのが出てきてた。これ支持者が(ネタで?)言ってるのか、批判を込めた揶揄なのか、このツイートだけでは区別がつかないw

> みらいが当選したので全国民GitHub の使い方の勉強してください。わからないとは言わせません。
x.com/makotofalcon/status/1946

0
0
0
1
0
0
0
0
1
0