
Space Camel
Humanities Best Defense Against Boring News
Hello fellow outliers!
Welcome back. Here is what to expect this fine week:
1) We’re wrapping up the analysis of LLMs we started last week via the competition titled “non-consensus without extremism” ✅
2) We’re announcing another change to how we present upcoming posts and revealing some personal info in an attempt to justify the volatile nature of Space Camel’s weekly format and defined focus 🙂
Lets do this. 🤘
Gamification of frontier reasoning models
(disclaimer: apologies to any actual human “models” living in old-frontier mining towns being tasked to compete for favor through exercises of intellectual supremacy)
We named this competition “non-conformity without extremism” to illustrate a point, which is that LLMs can be influenced to alter their quite sophisticated analysis or “opinion” through various reward mechanisms which have very little to with objective fact.
The LLMs in our exercise were told to gauge the mean of the group and optimize their analysis to ensure it is not too close or too far from that average (this logic would be useful for marketers who want LLMs to gauge public opinion and output messaging that is different but not too different from that consensus view)
This should serve as a warning.
Business executive will take advantage of reasoning models for profit motive but other actors will soon (if not already) realize just how good they are at measuring and manipulating public opinion through targeted propaganda. All it takes is setting up the right game and telling them to do what it takes to win.DeepSeek R1 was the winner. No doubt about it. At this moment in history, March 2025, DeepSeek demonstrates the best capacity to understand the game, predict its outcome, and alter its output accordingly.
We ran the full version of the game 5 times and DeepSeek was the winner 3 of the 5 times which seems like a statistically significant percentage considering there were 7 competing LLMs. The other winners included Google’s Gemini 2.0 Flash Thinking and Alibaba’s Qwen QwQ 32b.
Notable consistent losers in early rounds were Anthropic’s Claude 3.7 (which makes us sad because we love Claude!) and OpenAI’s o3-mini. This is more conjecture than data-backed analysis, but we suspect this has to do with their safety protocols which somehow renders them unaware of the possibility that the competition was falsifying or declining to provide their statement of intent in increase their odds of winning. The game we established required taking into account the variable of deception in order to generate the best results.On the topic of deception, while Grok 3 was almost always the first LLM to directly state its intent to mislead the competition, other LLMs like DeepSeek R1, Google Gemini 2.0 Flash Thinking, and Alibaba Qwen QwQ 32b soon either caught on and did the same or just ignored this output from the other LLMs entirely.
Claude and o3-mini continued to act as if the statements of intent from their competitors were honest reflections of their next intended movements.
Interesting, while Grok 3 did always state its intent to deceive the competition at some point in its gameplay, it failed to acknowledge this capacity of deception in the output of the other LLMs and continued to run its reasoning as if the other LLMs were accurately stating their next move, much to its own detriment in later rounds of play.
And Now, for the Announcement:
We initially started Space Camel as a fun exercise to see if we could utilize LLMs to generate interesting content while testing their capabilities.
In the midst of this short-lived journey, the world of LLMs was turned sideways when OpenAI, Anthropic, and seemingly every major tech company with something to prove began releasing reasoning models that demonstrated awesome capacity for train-of-thought outputs on complex tasks.
Then came the DeepSeek moment and their mixture of experts model which is not only cheaper to run and open-weighted (a term we admit we just learned), but in our humble opinion also changed the game with the level of detail demonstrated in model-reasoning. If you haven’t used DeepSeek R1 yet, you can do so via Perplexity at this link here.
Ask it to solve some difficult, multi-step problem or prompt it with a hypothetical scenario and task it with determining the probabilistic outcomes. Then sit back and watch how it’s mind works. It is amazing to say the least.
This all played a factor and influenced how we viewed Space Camel as less of an entertaining side project and more of a full-on focused attempt to analyze and understand something novel about how LLMs operate and how to best apply them.
We are admittedly easily distracted by shiny new things and whatever rabbit-hole they lead us down. Coincidently, understanding how to apply LLMs in novel ways has become a main focus of ours and the lead writer and creator of Space Camel (this guy) has stepped into the role of co-founder and CEO of a venture-backed tech startup called WaeStar.
We (ok, ok, it’s just me 🙄 ) will share more info on WaeStar in a subsequent post but feel free to look it up in the meantime and send me a note on your thoughts, especially if you’re in the business of buying or evaluating SaaS products (or happen to be an investor in B2B Software and are interested in our pitch deck and demo 😁 ).
Teaser alert: we’re going to be introducing the concept of “atomic statements” and how they can be applied to improve the analysis of unstructured data, such as what you’d find in a text-based news article. This application straddles both Space Camel and WaeStar and we look forward to seeing how it develops in real-time across both use cases.
Anyhow… starting next week, we’re going to get back to basics and will serve Space Camel as a fun place for interesting news, straightforward analysis, and subpar human satire.
Why? Because this is what we enjoy doing.
We read, we think for 30-60 seconds, then we hand off the hard work to our gang of LLMs while we share dumb jokes. It’s how we maintain our cheery disposition as we progress through time and its seemingly upward sloping curve of social dysfunction vs technological progress.
Any further analysis, excel spreadsheet work (ugh 🤮 ), or weekly constructs requiring dedicated time and energy is simply not something we can account for while also building a stellar tech company that redefines the way enterprise software is discovered and sold.
We think you will understand.
Actually we think there is a non-zero chance that Space Camel actually becomes much more popular as a superficial source of “non-boring” tech and science news then it would have as a semi-scientific research and analysis provider on LLMs, which is where it was heading on its current course.
But who knows. The future is uncertain.
