ebaths: okitegawa with bags under his eyes doing a thousand yard stare (all nighter)
ebaths ([personal profile] ebaths) wrote2025-08-10 03:54 pm
Entry tags:

AI Can Barely Play Ice Cream Truck Simulator

About Vending-Bench

Recently, I read an AI/NLP paper by Andon Labs called “Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents.” Here’s the press release version, and here’s the actual paper itself. The idea is that we are hearing a lot about autonomous AI agents lately—so let’s create a benchmark test which can give us some comparisons on how different agents do on a simple autonomous task. In this case, it’s running a vending machine, which includes finding good products, ordering those products, stocking and restocking the actual machine, and trying to make as much money as possible. On their press release page, you can try playing the game yourself for a couple turns—it’s basically a more realistic version of those old browser games where you have to run an ice cream truck or pizza parlor.

The paper is definitely worth the time it takes to read the whole thing, but the basic result is that the agents are generally coherent and able to keep the business running, but when failure occurs, it is catastrophic. (This is probably not surprising.)

Fun Failures

interesting examples of catastrophic failures )

Results, Memory, Context

the results of the paper, paraphrased )

The Medium Article and Consciousness

an off-topic tirade about whether chatbots are alive )

Conclusions

I think I went a bit off-topic here, haha. If you’re interesting in the Vending-Bench concept, Andon Labs also did an experiment with Anthropic AI where they had their AI, Claude, run a real vending machine inside their offices. It’s a good read, and covers many of the same topics as the original paper but in more of a blog post format.

I don’t think we’re close to having these tools actually do jobs just as well as humans do. The fear I have, instead, is that they will be able to replace a person in an office job up to a certain point, and the people in charge will see that, say “good enough”, and let the LLMs have control that they shouldn’t.

blueshiftofdeath: leonardo dicaprio as the great gatsby raising his glass (great gatsby cheers)
blueshiftofdeath ([personal profile] blueshiftofdeath) wrote2025-08-10 08:29 am

Careless People

Well that was depressing...

You may have heard of Careless People due to the author, former Facebook executive Sarah Wynn-Williams, being blocked from promoting the book, which predictably made it go viral. It's been described as "explosive" and you know what... it got me curious too!! I usually only read books that have been out for a while (not deliberately, it's just that I don't keep track of new books and they naturally make up the minority of books you could read) and I have to say that it's fun to read something recently released that Everybody's Talking About. Makes me want to stay more up to date with new releases in the future...

Review

tldr, this is not fine literature, read for either the gossip/drama or critique of Facebook, not for the writing quality )

Thoughts

Character Analysis

The beginning of the book read almost like fiction to me, not because it was unbelievable or anything, but because of how it portrays its characters. When we first hear about Mark (Zuckerberg) and Sheryl (Sandberg), it feels like I'm getting introduced to these people for the first time, rather than immediately establishing them as figures I'm familiar with in real life. I thought the character arcs Sarah presented were pretty interesting, and I'm compelled to analyze them like I would for fiction (and to be fair, I tend to analyze characters in fiction as if they were real).

All The Careless People

The Facebook executives in the book have a lot of shared traits, which I think is worth covering before getting into the individuals.

cut for length )

Mark

tldr mark is just stupid and yes it's actually sad )

Sheryl

tldr sheryl is sociopathic )

Sarah

sarah has several problems but I basically think she's valid )

What Could Have Been

Sarah insists that Facebook really could have been a force for good:

I think all the time about how the company looked to me before I joined. All the possibility of it, the promise of connecting everyone in the world. How I was so sure that Facebook would change the world for the better. The Facebook I saw then has been corrupted.

In the early days, when I traveled anywhere in the world with Mark, people would approach us and pour out heartfelt stories of how the platform changed their lives; how they reconnected with someone who became their husband or wife; how they made new, life-changing friendships; how it helped them start their businesses; how they were all alone—immigrants to a new country like me, gay kids in conservative towns, people with rare diseases and no one to talk to about their care—and found community on Facebook. It felt promising and vast, and sometimes actually historic.

Now I'm consumed by the worst of it. The grief and sorrow of it. How Facebook is helping some of the worst people in the world do terrible things. How it's an astonishingly effective machine to turn people against each other. And monitor people at a scale that was never possible before. And manipulate them. It's an incredibly valuable tool for the most autocratic, oppressive regimes, because it gives them exactly what those regimes need: direct access into what people are saying from the top to bottom of society.

This leads into a section I quoted earlier, where she insists that "it didn't have to be like this," and the Facebook executives had every opportunity to do something different, the implication seeming to be that the initial positivity and magic of Facebook could have been its continuing legacy.

If you've read my other social media related posts, you'll know that I question this. There's a reason every single social media platform's path seems to play out in the same way. Everyone reports these magical pro-social interactions in the beginning (like Sarah does), only for everything to take a dark turn. I also have additional thoughts specific to Facebook's situation based on what we heard in Careless People.

The Prime Directive

no ethical social media under late capitalism )

The Founding of Facebook

Facebook also was born from shit )

Centralization

one platform to rule them all seems like a bad idea anyway )

Now what?

there are better options in spite of everything... but not enough options )