How you could build a search that the fediverse would welcome

Mastodon and the fediverse are clearly taking off, bringing in millions of new users, and also organically inspiring a wave of technical innovation that dwarfs all of the efforts that the bribes and empty promises of the Web3 crypto bubble couldn't touch. I'm even enjoying having settled into a relatively permanent new fediverse address at @anildash@me.dm, on Medium's new Mastodon instance, which (along with Mozilla's similar upcoming instance) should do a lot to legitimize the nascent open system. It's all great to see, though of course there are huge challenges that come along with this growth, and most of them (as always, with social media) are largely about people and culture, not technology.

Nothing exemplifies these opportunities and challenges better than search. Search has long been the killer app of the web, since the days of Yahoo and AltaVista on to the long reign of Google's dominance, to today's web where SEO is dying and TikTok is (inexplicably, to text-lovers like me) increasingly on the rise. In that complex environment, the intentional absence of substantial search features in the fediverse, especially in the flagship Mastodon experience that defines the nascent fediverse for so many new users, seems inexplicable. But search is also a signifier to those who pioneered and established the current era of the fediverse, symbolizing the extractive and exploitative hypergrowth systems that often ruined the positivity and promise of the human web.

Here's one of the most popular posts I'd ever shared on my first Mastodon account. I think it resonated so much because it articulates one of the key unspoken values of the fediverse.

Given that context, it's no surprise that a less experienced fediverse user, no matter how well-intentioned, might well accidentally antagonize some people while trying to build search features for Mastodon users. I've known Jan Lehnardt online for a long time, and known him to make lots of thoughtful things, but I'd missed his initial introduction of Searchtodon, an attempt at introducing personal search (not a public search engine) for the fediverse that I think a lot of long-time fediverse users could have anticipated would cause some consternation, despite his clear intent to be consensual and considered in his implementation. It's well worth reading his detailed retrospective on the reaction to the service.

I realize I've had an idea for how to implement Fediverse search for a long time that I've talked about privately with people but not shared broadly, and it's probably doing a disservice to potential implementors to not share some of my thinking. It also seems like it'd be great to float the idea as a concept rather than an implementation, so that people could react to the idea without the increased emotional urgency of feeling like they have to react to an existing potential threat.

Why search?

The first question to address when talking about creating a search to serve fediverse users is why do this in the first place? A lot of people would argue (to varying degrees of thoughtfulness) that this kind of capability shouldn't exist at all. Eugen Rochko, the lead developer and creator of Mastodon, and de facto administrator of community norms and consent for such features across the entire fediverse, addressed this many years ago in a post that I remember being largely non-controversial at the time.

But interestingly, even that core assertion would likely be strongly debated in an environment that, six years later, feels even more fraught. So let's lay out some key reasons for why a search capability might be a net positive for the fediverse, explore the arguments against it, and then share the parameters of a potential implementation that could deliver on the positives without triggering the negatives.

First, the benefits of search:

  • Search enables content to be citable and referenceable over time. The web itself was created as a medium for publishing, and then linking to (citing) those published works, and being able to search for an item vastly increases the odds that it will be cited or discovered later, increasing the ability to build knowledge, and especially allowing good ideas to be found later on even if their creators didn't initially have reach or reputation or social access which allowed their ideas to spread at the time they were published.
  • Search can enable people to document their credentials and authority. This is most vital for people who are part of vulnerable or marginalized communities, where coming with receipts is a fundamental part of asserting authority in a culture that might otherwise dismiss or diminsh their work.
  • Search enables many kinds of fun, creative and meaningful expressions. From a technical perspective, hashtags and other forms of group sharing/discovery are just specific implementations of search. These are foundational for many kinds of play, like hashtags used for riffing on cultural events or trends, as well as the oft-cited impact of hashtags used for activism. But even if it doesn't follow the form of traditional hashtags, search enables community forming through collective recall of memorable moments or particular turns of phrase.
  • Search can be a tool for reflection, growth and learning. Many of us who've been on social media for years or decades have had the chance to look back at our own (or others') shared expressions and see how they've aged or changed in meaning over time, prompting a lot of reflection and reconsideration. Though the common perception of social media is shaped by its worst failure cases, where people are only ever increasingly polarized and never learn from their mistakes, the truth is that billions of people have been on these networks for many years, and in the vast majority of cases, even where past messages were flawed or even embarrassing, most people used those awkward moments from themselves or others as a prompt for personal growth.
  • Search can help us see our impact. Being able to search for mentions of ourselves or our work can help us learn from reactions and responses, help us find connection or community from those who respond or react to our work, or help us confront areas to improve by seeing thoughtful critique or corrections that come from those in our networks or beyond. This kind of search can also be a form of credibility, as showing these responses (or even just the sheer volume of reactions) to something we write or publish can often be a powerful tool for gaining audience or institutional support over time. I've learned an immense amount of invaluble insights from people reacting to my writings here over the years, and most of that was thanks to being able to search for reactions to my work, even if a lot of cynical culture tends to see "ego searching" as always being vain or insecure.

Okay, then — if there are all these wonderful benefits to search, how can anyone be against it? Why is there such cultural resistance to search in many corners of the fediverse? It's good to understand these valid and important objections, especially as they're seldom voiced in the for-profit, investor-backed tech industry (much of which is dependent on surveillance-based search for either its growth or sustenence).

  • Search enables visibility, which means vulnerability for many people. Marginalized and vulnerable people are systematically targeted by hateful and often violent movements whenever they effectively communicate their ideas and advocate for their communities. As a result, making it easy for bad actors to search for, and subsequently target, these vulnerable people has been a constant, exhausting, and brutal reality of the social web for a generation. This takes literal form in realtime networks like Twitter, where we've seen (for example) homophobes in Saudi Arabia search for, and successfully identify, LGBTQ people whom they then targeted for legal persecution and imprisonment. Discoverability leads to literal danger for many, and the terms of visibility are often unilaterally changed by for-profit entities without consent or even notification of those affected. For example, Facebook has, multiple times, taken content that was private or limited only to friends, and made it both publicly-accessible and easily searchable even by those who would use it to target and harm those who had originally created that content.
  • Search removes context, which can open people up to being targeted. There are many years of academic research about "context collapse", which are worth exploring for those who want to get a firmer grounding in this concern, but at its most basic, it's easy for anyone to understand that words or ideas taken out of context and hyper-amplified can make someone a target for harassment or attacks. Sometimes this can even start from a well-intentioned attempt at accountability for someone who has, either intentionally or unintentionally, shared a harmful idea, but then snowballs into something far more destructive than even the original critic intended. And most of these scenarios are, at a technical level, enabled by search.
  • Search enables monetization of people without their consent. The web-scale search engines that define today's internet experience for billions of people have overwhelmingly operated without a concept of consent, creating increasingly aggressive norms for harvesting content and contributions without any compensation to those whose work is consumed and collected. Even the few tools that exist for "opt out" are often ignored, and more broadly the coercive effects of setting a default expectation of being opted in means that those who would prefer to not be indexed and included in massive search systemss effectively have no choice if they want to participate in the contemporary digital world. This results in exploitations like nearly every digital artist who has ever written text or created an image being included in the training models for new AI systems, without ever explicitly consenting to that inclusion, and without any hope of ever being fairly compensated for the value they created for those systems. The vast majority of the base of training data for these systems were created under the pretense of improving search.
  • Search incentivizes increasing surveillance over time. Because search is still the most lucrative internet application that has ever been created, those who run large-scale search systems begin to capture more and more data over time as part of the feedback loop for increasing profits. It is a basic assumption that the larger a search index is, the more valuable the insights it can provide, and so violations of privacy, consent and user expectations become the norm due to the simple economic incentives in place for any search-based system.

This is nowhere near a comprehensive list of the dangers of search, but even just this handful of examples makes a pretty powerful point. It is right, reasonable, and responsible to have a default assumption that people creating search will make systems that enable harms.

Despite that risk, I think it's possible to create a search system for the fediverse that could be responsible. The key lies in a concept that has come up many times in discussion: consent.

How search?

The road to a search that works for the fediverse is to consider those who want such a thing. Increasingly, they're from communities that value the benefits of search even while being keenly aware of the risks. Hilda Bast did a wonderful job of gathering research on quote tweeting as part of an effort to bring some hard data to the fediverse discussion about quoting features, and it seems like it would be instructive for the search conversation as well. One finding that jumped out to me: "There’s quite solid evidence that journalists, politicians, and African-Americans use QTs more often." This difference in usage seems to mimic the positive use cases of search that these groups (and other similarly distinct and valuable contributors to social media) bring to the conversation.

So: What about a truly opt-in, consent-based approach to search? We need a search bot that we can follow.

Whereas today's consumer web is shaped by Google sending its bots across the internet as rapaciously as possible, on the fediverse it should be entirely possible to create a search engine that is exposed to users as a bot that you can follow — and unfollow — whenever you want. When you're following the searchbot, and you make a public post, it'll automatically be indexed and included in search. When you unfollow, or post something that the bot doesn't have permission to see, that content is automatically excluded from the search index.

"But wait!" many will exclaim, "Who would ever opt into this?" People are going to be pretty skeptical that such an implementation could work. Well, I sure as hell would opt in, and I think a lot of others who like to be able to dip into the public conversation would do so as well, at least part of the time. You see, what this kind of system allows is for us to choose when we're in public online, just like we can make that same decision most of the time in our offline lives. (Of course, what is public isn't always that simple.)

This idea is so unfamiliar that it often takes people a little while to really process the implications of it when I first describe the idea to them. Even as I've batted the idea around with smart people for a few years, their default stance is often to start from a standpoint that it would be impossible to build such a system, even though it's very obviously much simpler and easier than the complicated and fraught search systems we all rely on every day. There are a couple of other really cool aspects about an opt-in search system:

  • By letting people explicitly choose when their content is indexed by the system, you end up with content that's a bit edited and filtered by default, likely improving the quality of information in the system. And since everyone using this search knows that it's not comprehensive, by design, they'll be far more aware of the shortcomings of the system going in, in contrast to many users of the web today, who think that something that's not in Google must not exist.
  • The opt-in nature of this search also necessarily limits its scale and size, and effectively negates the surveillance-based economic incentives that make other systems so rapacious about gathering our data. That means there's far less motivation for a creator to try to maximize monetization, it limits the kind of advertising or other extractive business models that can be applied, and it greatly reduces the risk of larger entities trying to purchase the entire system because it's so explicitly trust-based. It also greatly brings down operational costs because scaling of the technical infrastructure becomes far more predictable — you could even implement an invite-based system to reduce the risk of unexpected technical surprises.
  • A search system that requires people to opt-in to also makes it far easier to identify whether an account that tries to follow the search bot is a real person or not, significantly reducing the impact of common spamming and manipulation techniques. Coordinated campaigns to try to game hashtags or search rankings are also far easier to detect if you can see the entire network of people that an account is connected to. You can even implement common community-management techniques like requiring an account to wait in a queue or to have been active for a certain period of time before it's allowed to contribute to the search index. These kind of community moderation techniques are well understood and quite mature after decades online, but they've never been effectively applied to search before.
  • In true fediverse fashion, such a search system could also be open source, with the code inspectable by the community, and the criteria for search ranking being visible and understood by the community. Instead of a proprietary or secretive search algorithm that rewards those who are most committed to gaming the system, you can make the search ranking transparent, since you've removed much of the financial incentives that induce people to spam conventional search engines.

Could it work?

Of course it won't necessarily be easy to build such an opt-in search system for the fediverse, but it's absolutely doable. The technology for building such a system is more accessible than ever, and the fediverse is still small enough in scale that one could use off-the-shelf tech to get started. The costs would be low enough to create as an experiment, and cheap enough over time to operate that it could be sustained by donations, perhaps alongside a few non-surveillance-based sponsorship ads.

The harder part is actually starting by building trust, socializing the idea with many long-time and/or trusted members of key fediverse communities. Explaining the concept and implementation to a number of moderators and admins of key fediverse instances and servers would go a long way, especially if they were allowed to examine the full system before anyone in their community is asked or invited to opt-in. Picking early users who are members of vulnerable communities, and who have been consistent in their criticism and skepticism about the tech titans that define search today would also do a lot ot build trust. And most importantly, providing robust and reliable tools for people to verify that their content is never included without their consent will also reduce a lot of the initial hesitancy that will surround such an offering. You'll need to have kick-ass privacy and content policies in place right from the start, but the community will be motivated to help you get them right.

But: it's all very possible! It requires patience, and a genuine willingness to build something that grows slowly over time, instead of trying to take over the world. The reward for that patience will be both the creation of a valuable and unique new resource for the open web, as well as a powerful demonstration to the technology world that other models are possible, which may be just as valuable as the service itself.

I hope someone (or a lot of someones!) build a system like this and that we see the flourishing of new approaches and a vital, thriving reminder that the open web is the truest way to inspire an exciting new wave of innovation. If you build it, let me know; I'll be first in line to follow your bot.