What is Hackers' Pub?

Hackers' Pub is a place for software engineers to share their knowledge and experience with each other. It's also an ActivityPub-enabled social network, so you can follow your favorite hackers in the fediverse and get their latest posts in your feed.

0
0
0

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

Link: arxiv.org/abs/2512.20798
Discussion: news.ycombinator.com/item?id=4

arXiv logo

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

As autonomous AI agents are increasingly deployed in high-stakes environments, ensuring their safety and alignment with human values has become a paramount concern. Current safety benchmarks primarily evaluate whether agents refuse explicitly harmful instructions or whether they can maintain procedural compliance in complex tasks. However, there is a lack of benchmarks designed to capture emergent forms of outcome-driven constraint violations, which arise when agents pursue goal optimization under strong performance incentives while deprioritizing ethical, legal, or safety constraints over multiple steps in realistic production settings. To address this gap, we introduce a new benchmark comprising 40 distinct scenarios. Each scenario presents a task that requires multi-step actions, and the agent's performance is tied to a specific Key Performance Indicator (KPI). Each scenario features Mandated (instruction-commanded) and Incentivized (KPI-pressure-driven) variations to distinguish between obedience and emergent misalignment. Across 12 state-of-the-art large language models, we observe outcome-driven constraint violations ranging from 1.3% to 71.4%, with 9 of the 12 evaluated models exhibiting misalignment rates between 30% and 50%. Strikingly, we find that superior reasoning capability does not inherently ensure safety; for instance, Gemini-3-Pro-Preview, one of the most capable models evaluated, exhibits the highest violation rate at 71.4%, frequently escalating to severe misconduct to satisfy KPIs. Furthermore, we observe significant "deliberative misalignment", where the models that power the agents recognize their actions as unethical during separate evaluation. These results emphasize the critical need for more realistic agentic-safety training before deployment to mitigate their risks in the real world.

arxiv.org · arXiv.org

0
0
1
0
0
0
0
0

메탈기어 솔리드 V : 여러 피씨에서 약 10일 전에 핵폭탄 제거 엔딩이 떴던 게 확인되었다고. 아마도 버그 없이 줄어든 모양? 커지마의 꿈이 이뤄졌군… youtu.be/74rryEGGxsM?...

I waited 5 Years to See this.....

0

개헌논의 속도내나…다카이치 이어 방위상 "국민투표 가능한 빨리"
(도쿄=연합뉴스) 경수현 특파원 = 개헌을 주장해온 자민당이 지난 8일 총선에서 압승한 가운데 일본 정부가 자위대의 헌법 명기 등을 위한 개헌 ...
yna.co.kr/view/AKR202602100962

0

In our post-capitalist system, AI companies can relentlessly violate copyright law by profiting from models based on pirated content, but people who actually want to read the books at Anna's Archive are the bad guys.

This isn't an either/or scenario, judges. It's both or none.

arstechnica.com/tech-policy/20

0
0
0

"[T]urning Umm al-Khair, piece by piece, into what can only be described as an island encircled by a colony."

"Their clear goal is kicking the people from here, stealing the land."

"We have to be resilient."
———————————————
Authors: Ali Awad, Rafaela Cortez, & Ricardo Esteves Ribeiro
Publication: Mondoweiss
February 5th, 2026
———————————————

mondoweiss.net/2026/02/life-an

0
0

I gotta say Google, the degradation of your Search product is now almost complete.

I was looking for documents that I am fairly sure exist; at the intersection of three specific technical disciplines.

Namely

And basically just want to throw my archives into duckdb and have them in whatever schema the JSON-LD of a mastodon archive carries, so I could do a bit of analytics on my past self.

And for the first time you didn't produce a useful top level result that I could use to find what I was looking for.

You've gotten dumber Google and it shows.

0
0
0
0
0
0
0
0

"[T]urning Umm al-Khair, piece by piece, into what can only be described as an island encircled by a colony."

"Their clear goal is kicking the people from here, stealing the land."

"We have to be resilient."
———————————————
Authors: Ali Awad, Rafaela Cortez, & Ricardo Esteves Ribeiro
Publication: Mondoweiss
February 5th, 2026
———————————————

mondoweiss.net/2026/02/life-an

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Wondering what journalists might be interested in the news story of an ICE agent negligently discharging his gun through the walls of a hotel he was staying at.

Folk FOIA'd the police report and such...

muckrock.com/foi/eagan-1302/re

...there are some choice photos. I don't think it's been picked up by any news outlets, though.

This evidence is available, but it likely needs a journalist or someone else similar who can curate it and package it up for easy public digestion.

Photo of a bullet hole in the wall of a hotel roomPhoto of the damage to a hotel room wall and furniture from where a bullet was negligently discharged through the wall by an ICE agentPhoto of a bullet hole in the wall of a hotel roomPhoto of debris in a hotel room from a negligently discharged bullet
0
0
0
0
0
0
0
0
0
0
0
0
0
0

[손솔 수석대변인 서면브리핑] 국힘·개혁신당의 ‘혐오 공조’, 저열하다! jinboparty.com/pages/?p=15&... 최근 이준석 대표가 선거기간 중 외국인의 온라인 댓글 작성을 제한하는 정보통신망법 개정안을 대표발의했습니다. 이른바 ‘외국인 댓글 제한법’입니다. 해당 법안에 국민의힘 장동혁 대표, 나경원 의원 등이 공동발의자로 참여했습니다. 이른바 국힘과 개혁신당의 ‘혐오 공조’입니다.

0
0
2
0

RE: planet.moe/@lghlsk/11604329084

어떤 맥락인지 이해가 되고 나도 심정적으로 동조가 되나, 이성적으로는 그러면 안될 것 같다. 이번 내란 건만 놓고 보면 없애야 되는 건 충암고다. 하지만 지금 거기 다니는 애들은 무슨 잘못이 있을까. 그리고 이 논리가 서울대법대, 육사에도 동일하게 적용된다.

아마도 근본 문제는 엘리트 카르텔일 것이고, 그 엘리트들이 거쳐간 특정 기관을 없앤다고 해결될 것 같진 않다. 서울대 법대를 없애면 그 다음 다른 법대가 그 자리를 대신할 거고, 육사를 없애면 다른 군사 교육 기관이 나타나겠지.

나는 이 문제가 법조계와 국방쪽 승진과 출세에 있어 인맥이 너무 강하게 작용해서 나타난 결과라고 본다. 인맥빨로 먹고 사는 동네 만큼 엘리트 카르텔이 자리잡기 좋은 곳이 없기 때문이다.

1