Some aspects of the described behavior are as we intended and some are not. The cause is not exactly as described in the blog post. As for mitigation, we are already testing a patch of the unintended behavior on a subset of our infrastructure. If any of you try to reproduce the blog post's findings you may get confusing results throughout the day.
We will also re-evaluate whether the intended behaviors are acceptable or not. Some of this is a trade-off between multiple aspects of privacy, and multiple aspects of user experience.
Please note that this is my current understanding, which may change. I was only made aware of this an hour ago, and most of that time was spent talking with Ops, considering what to do immediately, and writing this post.
Finally, for those of you who do security research: when you find a security or privacy issue, please consider notifying the maintainer/vendor before publishing your findings, even if you intend to publish right away.
Not just Amazon, too. It feels like all of big tech (and some smaller firms) have simultaneously gone insane. Imagine if your CEO woke up one day and told the company: "We need to encourage travel spending. Please book as many business trips as you can, and spend as much money as possible. Fly first class to our satellite offices! Take limos instead of Ubers! Eat at fine restaurants! Make sure you are constantly traveling. In fact, we are going to make Travel Spending part of your annual performance review: If you don't spend enough on business travel, you'll get a low rating!"
> BTW, I approached ABC about buying back the former FiveThirtyEight IP*, and they said they wouldn't sell at any price because I'd criticized their management of the brand.
Hi! I'm one of the programmers at Gutenberg.
We've been improving the site a lot over the past few months (and more is coming!).
If you haven't visited the page recently, it's worth checking out again: https://www.gutenberg.org/
I'm pretty sure he's talking about companies and people outsourcing their decision making and thinking to AI and not really about using AI itself.
I don't think using AI to write code is AI psychosis or bad at all, but if you just prompt the AI and believe what it tell you then you have AI psychosis. You see this a lot with financial people and VC on twitter. They literally post screenshots of ChatGPT as their thinking and reasoning about the topic instead of just doing a little bit of thinking themselves.
These things are dog shit when it comes to ideas, thinking, or providing advice because they are pattern matchers they are just going to give you the pattern they see. Most people see this if you just try to talk to it about an idea. They often just spit out the most generic dog shit.
This however it pretty useful for certain tasks were pattern matching is actually beneficial like writing code, but again you just can't let it do the thinking and decision making.
I thinking that it’s quite a different experience going all Jackson Pollock with AI in your own studio on your own terms, compared to the sorry state of affairs of having 100s of Pollocks throwing paint around wildly within a corp to meet a paint quota.
> The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue.
This is incredibly good for science. arXiv is free, but it's a privilege not a right!
Besides the people in this thread bemoaning the state of research funding, international students, etc. (all of which are valid), a lot of people are becoming disillusioned with academia. Probably 80% of the recent PhD grads I know are looking to leave academia, despite the fact that they went into it to pursue a career in academia. The median science PhD takes 6 years now, and is grueling work for terrible pay, all for difficult job prospects given the current market. MIT recently became one of the first universities to get a grad student union to try and combat the increasingly exploitative nature of academia. I can see how undergrads may look at how AI can do most of their homework assignments, and see how miserable grad students are, and decide that they don't want to continue down that path.
On top of that, if you look at 'Pointers & ownership' and 'Collections' sections, the Bun codebase is already prepared, using internal smart pointer types that map 1-to-1 to Rust equivalents, and `bun_collections` Rust crate already exists.
This makes an impression, that rewrite was prepared long time ago and was Bun team proposition to Anthropic during the acquisition deal.
I think AI rescue consulting is going to be come a significant mode of high value consulting, similar to specialists who come in to try and deal with a security breach or do data recovery.
Purely AI written systems will scale to a point of complexity that no human can ever understand and the defect close rate will taper down and the token burn per defect rate scale up and eventually AI changes will cause on average more defects than they close and the whole system will be unstable. It will become a special kind of process to clean room out such a mess and rebuild it fresh (probably still with AI) after distilling out core design principles to avoid catastrophic breakdown.
Somewhere in the future, the new software engineering will be primarily about principles to avoid this in the first, place but it will take us 20 years to learn them, just like original software eng took a lot longer than expected to reach a stable set of design principles (and people still argue about them!).
> I feel like I'm in a different field compared to the rest of hacker news.
And below you repeat what all of Hacker News hypemen say about AI (“I have stopped writing code”, “it’s mature and the next step of engineering”)
Thank you for reinforcing the point of OP
EDIT: you're the same person that a month ago said your company feels git is outdated now that you have agentic coding, and you don't even need to write your own commit messages. This is next-level trolling, or a serious case of AI psychosis.
I can't relate that much to this. Every time I use AI to write code, I'm constantly fighting a feeling on the back of my neck that I need to look over everything it has done and supplement/alter it with my own code. That ick feeling counteracts the dopamine hit of having a working app after a few minutes of vibe coding, and I don't think that's going anywhere anytime soon.
That said, I have experience. I could absolutely see myself falling into this as a junior or even mid level dev. I'd no doubt not feel that feeling on my neck if it wasn't scarred from code review lashings early in my career by knowledgeable mentors.
Correct. I use AI a ton and I'm having more fun every day than I ever did before thanks to it (on average, highs are higher, lows are lower). Your characterization is all very accurate. Thank you.
> Even after the modem is removed, if you connect your phone to the car via Bluetooth then the car will use your phone as an internet connection and send all the same telemetry data back to Toyota. However, if you use a wired USB connection then it does not do that (see the discussion here and elsewhere), so I exclusively use CarPlay via USB.
The problem with this is that both carplay and android auto capture their own vehicle telemetry. So even though the car is not able to use your phone as a general data pipe, Google and Apple still get access to this data when you're connected.
They are both very cagey with how they talk about this (or don't).
I used to work with a brilliant and humble guy. He got accepted to MIT at 14, but his parents made him go to community college for a year to give him a little more time to mature. He then went to MIT and graduated after three years, then went to Berkeley and got a masters in one year, then went to Stanford and it took six years to get his PhD?
Why? Because his advisor milked him for his work. She had a pile of papers to peer review ... hand it off to the grad studends. Have a talk to give? Give the grad students the task for writing up first drafts, collecting data, generating graphs etc. My friend said that nothing in the first five years of his PhD work contributed to his dissertation.
I'm amazed that behavior like that of the advisor is allowed.
I followed the link to the Pixel 9 bug/exploit and saw this:
"Over the past few years, several AI-powered features have been added to mobile phones that allow users to better search and understand their messages. One effect of this change is increased 0-click attack surface, as efficient analysis often requires message media to be decoded before the message is opened by the user"
Haven't we learned our lesson on this? Don't read and act on my sms messages without me asking you to!
Sounds a like a tactical tornado, made me think of this paragraph:
“Almost every software development organization has at least one developer who takes tactical programming to the extreme: a tactical tornado. The tactical tornado is a prolific programmer who pumps out code far faster than others but works in a totally tactical fashion. When it comes to implementing a quick feature, nobody gets it done faster than the tactical tornado. In some organizations, management treats tactical tornadoes as heroes. However, tactical tornadoes leave behind a wake of destruction. They are rarely considered heroes by the engineers who must work with their code in the future. Typically, other engineers must clean up the messes left behind by the tactical tornado, which makes it appear that those engineers (who are the real heroes) are making slower progress than the tactical tornado.”
- John Ousterhout, A Philosophy of Software Design
I say this as somebody who has worked vendor side in UK public sector for a number of years.
It's policy. It's official Whitehall policy.
As a department you can't hire programmers at £100k/year, because that pushes them way, way higher than civil service bands allow. But you can pay a "Systems Integrator" - a consultancy like Cap Gemini, Deloitte, Fujitsu - £600/day for the same programmer in the same seat. So, £100k/year = bad, £120k/year via an external consultancy = good.
Then we get into actually building and owning tech. Look at the history of GDS - they were empowered to pay half decent salaries and build and own things, but then had budgets slashed and programs cut. Why? Because we can "just buy it". Yes, you won't own the IP, it'll cost 4x as much, it'll take 3x-5x longer, but at least you won't have "inefficient civil service bloat" to have to manage.
This all started in the 1980s, and there are signs of it swinging back. I was at one department last year where they were telling me they're thinking about hiring actual engineers and embedding some devops stuff internally - absolutely jaw-droopingly revolutionary. Genuinely.
As a lawyer, I'm excited about this, but there are two roadblocks that I'm not sure how Anthropic will navigate:
(1) For non-lawyers who use these skills/connectors/whatchamacallits to try to get legal advice, their communications are not protected by attorney-client privilege. This will absolutely bite some people in the ass.
(2) If a lawyer uses this with confidential client information (which, to the uninitiated, doesn't just mean SSNs and bank account numbers, but "all information relating to the representation of a client") and forgets to toggle off "Help improve Claude" in their settings, they have possibly (maybe even likely) committed malpractice.[1]
> open source server code if you are going to cease support
When I was a senior exec at a big public tech company, there was a product we decided to discontinue and we thought would be nice to just open source. Somehow I ended up in charge of managing that process and was shocked at how complex, time-consuming and expensive it was in a multi-billion dollar, publicly-traded corp vs some code my friends and I wrote.
Legal had to verify that there was no licensed library code used and that we had clear, valid copyright to everything there. The project had been written over several years, merged with a project we'd acquired with a startup, some key people weren't around any more, the source control had transitioned across multiple platforms, etc. And even once we nailed all that down sufficiently, we didn't get an "all clear" from legal, we just got a formal legal opinion that any liability was probably under $1M. And then we had to convince an SVP to endorse that assumption of $1M potential liability and make a business case for approval to the CEO.
For a public company, the default assumption for any online game would be "the server side code WILL be open sourced" (under threat of prosecution). That means legal would mandate "No commercially licensed libraries can be used, any open source libraries will have to be vetted to ensure the license is compatible and everything else will need to pass IP and compliance audit." That will certainly have an impact on development time frames and economics.
I have been bothering the VM team for years for VM GPU pass through. I worked on the Apple Silicon Mac Pro and it would have made way more sense if you could run a linux VM and pass through the GPU that goes inside the case!
Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!
Don't. You are exactly the wrong kind of firm to be pursuing SOC2.
SOC2 is like the corporate GPL of security. It's an infectious secret handshake company security teams swap in lieu of filling out security questionnaires. Nobody savvy takes it seriously.
There will come a time where your business will grow to the point where it makes sense to pay for the secret handshake. The overwhelming most likely scenario in which that happens is a purchase order made contingent on your SOC2 Type I attestation, where the revenue from that purchase order more than pays for the attestation.
Do not ever do a SOC2 speculatively, in the hopes that it will improve your sales prospects. Plenty of successful firms don't have SOC2s. If you're losing sales where SOC2 is a factor, you didn't have those sales to begin with.
Having spent time working in UK healthcare tech, I never understood why everyone was lining up to throw buckets of money at Palantir. Quite apart from being obviously evil and so on, none of their solutions were actually very good.
Unfortunately, it's hard to escape the feeling that friends in high places, some lobbying and some er... reciprocal back scratching might have been instrumental.
See also senior staff at NHS England (or Digitial? can't remember) handing massive NHS compute contracts to AWS, and then leaving the civil service to become... an AWS employee.
> Purely AI written systems will scale to a point of complexity that no human can ever understand and the defect close rate will taper down and the token burn per defect rate scale up and eventually AI changes will cause on average more defects than they close and the whole system will be unstable.
Wow, it’s true, AI really is set to match human performance on large, complex software systems! ;)
It seems like the fair solution to this problem is to open source server code if you are going to cease support for an online game. That way the community has the opportunity to run their own servers if they want to.
I also really support giving 60 day notice if an online game is going to shut down. Places I have worked have had policies like that for games they are sun setting and I think the best game publishers think a lot about how to do that operation. It's not simple, because if people think a game is going away their behavior changes. And nothing sucks like buying online content for a game right before it shuts down. No matter what you do people will tell you they didn't know the game was shutting down. And if you give away content that you previously sold that also sometimes angers the community.
The problem is when companies know a game isn't working they tend to want to shut it down right away because the money they spend keeping it up is never coming back. And maybe the company is going to die too. So I do support a law for a 60 day notice.
Which goes on to prove that bottleneck isn't in writing the code. It is in reading and understanding the code.
We all had that one "productive" engineer in our teams who would write huge PRs that would have large swaths of refactoring whether warranted or not and that was way before anyone even could imagine in their wildest dreams that neural networks could generate that huge amounts of code.
The net effect of such a "productive" engineer always was that instead of increasing the team velocity, team would come to a crawling pace because either his PR had to be reviewed in detail eating up all the time and/or if you just did cursory LGTM then they blew up in production meanwhile forcing everyone back to the drawing board but project architecture would have shifted so rapidly due to his "productivity" that no one had a clear picture of the codebase such as what's where except that one "super smart talented productive loyal to the company goals" guy.
In my experience, Claude only knows how to spew code. Every problem you want it to solve, it translates into "more code" rather than "less code". You have to very closely code review everything it does, otherwise your codebase is going to just grow and grow, and asymptotically approach 100% debt.
I code review everything that Claude produces, and I'd estimate about 90-95% of the time, my reaction is WOW it works but too much code dude, let's take 3 hours to handhold you through simplifying it until nothing more can be removed.
I watched a pickup roll coal in the middle of freaking East Bay, literally within site of downtown San Francisco, on a bicyclist. I reported their license to the California Air Resources Board, and not longer after that I saw it up on jacks in a neighborhood auto shop. That made my day. Asshole.
Some aspects of the described behavior are as we intended and some are not. The cause is not exactly as described in the blog post. As for mitigation, we are already testing a patch of the unintended behavior on a subset of our infrastructure. If any of you try to reproduce the blog post's findings you may get confusing results throughout the day.
We will also re-evaluate whether the intended behaviors are acceptable or not. Some of this is a trade-off between multiple aspects of privacy, and multiple aspects of user experience.
Please note that this is my current understanding, which may change. I was only made aware of this an hour ago, and most of that time was spent talking with Ops, considering what to do immediately, and writing this post.
Finally, for those of you who do security research: when you find a security or privacy issue, please consider notifying the maintainer/vendor before publishing your findings, even if you intend to publish right away.