Two Approaches to Solving the "Quiet Fediverse" Problem: Conversation Backfilling Mechanisms

洪 民憙 (Hong Minhee) @hongminhee@hackers.pub

Anyone who has used the Fediverse has likely experienced this at least once. It seems like an interesting discussion is taking place, but when you look at the conversation, you only see a few replies, or disconnected replies appearing sporadically without context. It feels like being in a group discussion where you can only hear some people's voices.

This is the "quiet Fediverse" problem that Fediverse users often encounter. The presentation The Fediverse is Quiet—Let's Fix That! at FOSDEM in Brussels in February 2025 addressed this issue head-on.

In this article, we'll examine why conversation disconnection occurs in the Fediverse and take a detailed look at two main approaches developers have proposed to solve this problem. We'll explain everything from the technical principles of each method to actual implementation cases, along with the advantages and disadvantages of each, accompanied by rich examples.

Note

This article is based on Backfilling Conversations: Two Major Approaches written by @julian of NodeBB, translated into Korean and supplemented with additional analysis for the Korean developer community.

While based on the structure and core ideas of the original article, this version reinforces technical concept explanations and adds actual implementation cases. It was written with the help of AI.

Thanks to the original author @julian and the Fediverse developer community who participated in active discussions.

Root Cause: The Distributed Nature of ActivityPub

What is ActivityPub?

First, we need to understand the ActivityPub protocol that forms the foundation of the Fediverse. ActivityPub is a W3C standard protocol for decentralized social networks that allows users on different servers to interact with each other.

In ActivityPub, all interactions are expressed as activities. For example, when you create a new post, a Create(Note) activity is generated, and when you reply, another Create(Note) activity is generated indicating that it's a reply to that post. More details can be found in the ActivityStreams 2.0 specification.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Create",
  "id": "https://alice.example/activities/create-reply-123",
  "actor": "https://alice.example/users/alice",
  "published": "2025-06-09T10:30:00Z",
  "to": ["https://bob.example/users/bob"],
  "object": {
    "type": "Note",
    "id": "https://alice.example/notes/reply-123",
    "content": "That's a really interesting perspective!",
    "inReplyTo": "https://bob.example/posts/original-post",
    "attributedTo": "https://alice.example/users/alice"
  }
}

The Dilemma of Distribution

The distributed nature of ActivityPub is the root cause of the problem. Unlike centralized platforms (X, Facebook, etc.), in the Fediverse, conversations are stored distributed across multiple servers.

Consider a scenario where Alice (alice.example) writes an original post, Bob (bob.example) replies to Alice's post, Charlie (charlie.example) replies to Bob's reply, and Dave (dave.example) directly replies to Alice's original post:

Alice's original post
├── Bob's comment
│   └── Charlie's comment
└── Dave's comment

In this case, each server may only have the following information. alice.example might know about Alice's original post, Bob's reply, and Dave's reply, but not about Charlie's reply. bob.example might know about Alice's original post, Bob's reply, and Charlie's reply, but not about Dave's reply. As a result, no one can see the complete picture of the entire conversation.

Foundational Concept for Solutions: The context Property

Before examining the two main solutions, we need to understand the key context property. The context property defined in ActivityStreams 2.0 is used to group related objects. However, since the specification defines this "intentionally vague," it's being utilized in various ways in actual implementations.

Actual Forms of context Values

1. Simple Identifier

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Note",
  "content": "This is the first comment",
  "context": "https://example.com/conversations/abc123"
}

2. Mastodon Style (ostatus:conversation)

Mastodon uses the conversation property from the OStatus era alongside the ActivityPub standard context.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "ostatus": "http://ostatus.org#",
      "conversation": "ostatus:conversation"
    }
  ],
  "type": "Note",
  "content": "This is a reply",
  "context": "https://mastodon.social/contexts/abc123",
  "conversation": "tag:mastodon.social,2025:objectId=12345:objectType=Conversation"
}

3. Interpretable Collection URL (FEP 7888 Method)

In this case, sending a GET request to the context URL returns an OrderedCollection containing all posts in that conversation. This is the approach proposed in FEP-7888: Demystifying the context property.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Note",
  "content": "This is part of a thread conversation",
  "context": "https://forum.example/topics/technology/thread-42",
  "inReplyTo": "https://forum.example/posts/789"
}

First Approach: Reply Tree Crawling

Overview and History

The reply tree crawling method was pioneered by @jonnyjonny (good kind) of Mastodon. It was first proposed on April 15, 2024 and merged into Mastodon core on March 12, 2025.

The core idea of this approach is "fetch all replies." It involves sequentially crawling the entire reply tree to find missing conversations.

Technical Operating Principles

1. Required Prerequisites

For this approach to work, all ActivityPub objects must provide a replies collection. This is a collection representing the list of replies received by an ActivityPub object. This allows for exploring all replies to a specific post.

{
  "id": "https://alice.example/posts/1",
  "type": "Note",
  "content": "What do you think?",
  "replies": {
    "type": "OrderedCollection",
    "id": "https://alice.example/posts/1/replies",
    "totalItems": 3,
    "first": "https://alice.example/posts/1/replies?page=1"
  }
}

2. Crawling Algorithm

The operation of reply tree crawling is essentially similar to depth-first search (DFS). It starts from the initial post and repeatedly goes down to find all replies.

Looking at the specific process, it first checks the replies collection of the starting post. This collection contains a list of replies directly made to that post. Then it fetches and processes each reply one by one, with the important point being that each reply can also have its own replies collection.

async function crawlReplyTree(postUrl: URL): Promise<Note[]> {
  const post = await fetchNote(postUrl);
  const allReplies: Note[] = [];
  
  const replies = await post.getReplies();
  if (replies) {
    for await (const reply of replies.getItems()) {
      if (reply instanceof Note) {
        allReplies.push(reply);
        const subReplies = await crawlReplyTree(reply.id!);
        allReplies.push(...subReplies);
      }
    }
  }
  
  return allReplies;
}

The key to this approach is that it's based on the assumption that each node (post) accurately provides a list of replies made to it.

3. Actual Implementation in Mastodon

Mastodon uses an implementation that adjusts the theoretical algorithm to fit real network environments. The key difference is that it considers realistic constraints.

According to @jonnyjonny (good kind), the current implementation includes several practical considerations. It starts from the expanded post and proceeds downward, can start crawling from any point in the tree, and includes a cooldown mechanism to prevent duplicate crawling.

Advantages

  • Universality: The inReplyTo and replies properties are universally used in almost all ActivityPub implementations. Therefore, it can be applied without significantly changing existing infrastructure.

  • Implementation Consistency: The usage of these properties doesn't vary greatly across most ActivityPub implementations.

  • Complete Tree Construction: In ideal cases, you can obtain a complete conversation tree including all branches and leaves.

Disadvantages

  • Network Vulnerability: If a single node in the reply tree becomes temporarily or permanently inaccessible, all branches derived from that node also become inaccessible.

  • Linear Workload Increase: Workload such as CPU time and network requests increases linearly in proportion to the size of the reply tree. Performance issues may arise in large-scale discussions.

  • Need for Re-crawling: To discover new branches, the entire reply tree must be crawled again. In rapidly growing discussions, you may not get a complete tree depending on when the crawling starts.

  • Incomplete Implementation Reality: Realistically, not all ActivityPub implementations provide a replies collection. Mastodon includes only up to 5 replies from the same server in the replies collection for performance reasons, and many smaller implementations omit this or implement it incompletely for performance reasons.

Current Implementation Status

Currently, Mastodon is the only complete implementation of this approach. However, this approach is not unique to Mastodon and can be adopted by other implementations.

Second Approach: Context Owner Approach

Overview and Background

The context owner approach was born from the combination of several FEPs[1]. FEP-7888 addresses "demystifying the context property," FEP-171b defines "conversation containers," and FEP-f228 proposes the integration and extension of these FEPs.

The core of this approach is the concept of a "context owner". It's a centralized approach where the original author of the conversation or a designated entity manages all the content of that conversation.

Technical Operating Principles

1. The Role of the Context Owner

Who becomes the context owner? Generally, the user who created the top-level post (root post) of the thread becomes the context owner. For example, if Alice wrote an original post asking "How's the weather today?", Alice becomes the context owner of that conversation.

However, in forum or group environments, the forum administrator or group owner might take on the role of context owner. The key is that someone has the authority to determine the "regular membership" of that conversation.

The context owner provides an OrderedCollection that includes all members of the conversation they manage.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "https://w3id.org/fep/171b"
  ],
  "type": "OrderedCollection",
  "id": "https://alice.example/conversations/tech-discussion",
  "attributedTo": "https://alice.example/users/alice",
  "collectionOf": "Activity",
  "totalItems": 15,
  "orderedItems": [
    "https://alice.example/activities/add/1",
    "https://alice.example/activities/add/2",
    "https://alice.example/activities/add/3"
  ]
}

2. Two-Step Activity Process

In this approach, adding a comment must necessarily be done in two steps. Why make it so complicated?

The first reason is moderation. Simply writing a reply doesn't automatically include it in the conversation; it must be approved by the context owner.

The second reason is consistency. The collection managed by the context owner only contains Add activities, so other servers reading this collection later can clearly know that "these are all contents approved by the context owner."

The third reason is broadcasting. Not only direct comments but all comments and replies belonging to the conversation are sent to the context owner, so the context owner is aware of all nodes included in that conversation. Therefore, they can notify all conversation participants that a new comment has been added.

Step 1: The reply author sends a typical Create(Note) activity

Bob wants to reply to Alice's post. Bob creates a Create(Note) activity as usual, but includes the conversation ID managed by Alice in the context property of the Note object.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Create",
  "actor": "https://bob.example/users/bob",
  "published": "2025-06-09T11:00:00Z",
  "to": ["https://alice.example/users/alice"],
  "object": {
    "type": "Note",
    "id": "https://bob.example/notes/reply-456",
    "content": "That's a really good point!",
    "inReplyTo": "https://alice.example/posts/original",
    "context": "https://alice.example/conversations/tech-discussion"
  }
}

The important point is that Bob sends this activity directly to Alice, the context owner (see the to field). This is to let Alice know about Bob's reply.

Step 2: The context owner (Alice) creates an Add(Note) activity

Alice receives Bob's reply and determines that it's content worth including in her conversation. Alice then creates an Add(Note) activity to add Bob's reply to her conversation collection.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Add",
  "actor": "https://alice.example/users/alice",
  "published": "2025-06-09T11:05:00Z",
  "object": "https://bob.example/notes/reply-456",
  "target": {
    "type": "OrderedCollection",
    "id": "https://alice.example/conversations/tech-discussion",
    "attributedTo": "https://alice.example/users/alice"
  },
  "to": ["https://www.w3.org/ns/activitystreams#Public"]
}

This Add activity means "Alice has officially included Bob's reply in her conversation." If Alice had determined that Bob's reply was spam or inappropriate content, she might not have created this Add(Note) activity.

3. Backfill Mechanism

Individual implementations can request the entire conversation content from the context owner.

async function performContextBackfill(contextUrl: URL): Promise<Note[]> {
  const collection = await fetchCollection(contextUrl);
  const notes: Note[] = [];
  
  for await (const item of collection.getItems()) {
    if (item instanceof Add) {
      const note = await item.getObject();
      if (note instanceof Note) {
        notes.push(note);
      }
    }
  }
  
  return notes;
}

Advantages

  • Benefits of Pseudo-centralization: A consistent conversation state can be maintained through the "single source of truth" provided by the context owner.

  • Efficient Network Usage: The entire conversation can be retrieved with a single request to the context owner, making it more network-efficient than reply tree crawling.

  • Overcoming Intermediate Node Failures: Unlike reply tree crawling, even if an intermediate node goes down, the entire conversation can still be accessed through the context owner.

  • Efficient Deduplication: Object deduplication is possible at the context level, reducing the total number of network requests and CPU time.

  • Synchronization Optimization: Network calls can be further reduced through synchronization methods using ID checksums.

Disadvantages

  • Context Owner Dependency: The biggest weakness is the dependency on the context owner. If the context owner's server is inaccessible, the entire conversation backfill becomes impossible.

  • Limited Visibility: The context owner can only respond with objects/activities they are aware of.

  • Missing Upward Propagation Issue: As a fundamental limitation, the context owner cannot know about lower branches that aren't propagated back up to the root.

  • Implementation Support Required: This approach only works if the context owner supports it, so it must be combined with other backfill strategies.

Current Implementation Status

NodeBB, Discourse, WordPress, Frequency, Mitra, and Streams are currently implementing this approach, while Lemmy and Piefed have expressed interest.

Important Points of Contention

1. Conflict of Moderation Paradigms

This is a key issue raised by @silverpill in the related NodeBB thread.

I don't fully agree with this statement, because these 'threading paradigms' suggest two different solutions to the problem of moderation.

Moderation in the Context Owner Approach

If Alice doesn't create an Add(Note) activity for a spam comment, that comment is excluded from the conversation.

Moderation in Reply Tree Crawling

Each reply is independent, and authors can only moderate their own replies directly. The original post author cannot control the entire conversation.

2. Solutions to the Missing Upward Propagation Issue

Utilizing Addressing Rules (FEP-171b)

FEP-171b presents the rule that "The audience of a reply MUST be copied from a conversation root."

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Create",
  "actor": "https://charlie.example/users/charlie",
  "object": {
    "type": "Note",
    "content": "Reply to Bob's comment",
    "inReplyTo": "https://bob.example/comments/2",
    "context": "https://alice.example/conversations/1",
    "to": [
      "https://bob.example/users/bob",
      "https://alice.example/users/alice"
    ]
  }
}

Hybrid Backfill Mechanism

Many implementations adopt an approach that combines multiple methods.

async function hybridBackfill(conversationId: URL): Promise<Note[]> {
  const strategies = [
    () => contextOwnerBackfill(conversationId),
    () => replyTreeCrawling(conversationId),
    () => mentionBasedDiscovery(conversationId)
  ];
  
  for (const strategy of strategies) {
    try {
      const result = await strategy();
      if (result.length > 0) return result;
    } catch (error) {
      console.warn('Strategy failed, trying next:', error);
    }
  }
  
  return [];
}

Additional Backfill Mechanisms

  1. Periodic Crawling Backfill: This approach is like a regular health check-up. The system checks active conversations at set intervals to see if there are any missing replies.

  2. User-Triggered Backfill: When a user accesses a specific conversation page, the system immediately reviews the context collection it currently holds and explores missing replies in real-time.

  3. Mention-Based Backfill: A mechanism that discovers missing reply chains through users' natural behavior of mentioning others in conversations.

    async function onMentionReceived(activity: Create): Promise<void> {
      const mention = await activity.getObject();
    
      if (mention.context && mention.replyTargetId) {
        const missingChain = await traceReplyChain(await mention.getReplyTarget());
        await addToContext(mention.context, missingChain);
      }
    }

Real Challenges

  1. Preventing Circular References: It's very important to prevent falling into infinite loops during the backfill process. In actual implementations, safety measures are put in place to track visited URLs and limit the maximum exploration depth.

  2. Performance Optimization: In large-scale conversations, hundreds of replies can be posted, and trying to process them all at once can put excessive load on the server. Batch processing is a method of dividing multiple conversations into small groups for sequential processing, with short rest periods between each batch.

  3. Error Handling and Recovery: Various types of errors can occur in a distributed network environment. In actual implementations, a resilient approach is used that sequentially tries various backfill strategies.

Standardization Efforts and Future Outlook

FEP Convergence Discussion

Currently, efforts are underway in the Fediverse community to integrate various FEPs through the FEP convergence thread.

The main FEPs being discussed include FEP-400e, which defines publicly addable ActivityPub collections, FEP-7888, which presents specific usage for the ambiguously defined context property, FEP-171b, which deals with centralized conversation management mechanisms, and FEP-76ea, which proposes methods for overall visualization of reply trees.

Collaboration Between Implementations

Currently, various implementations are collaborating for practical interoperability. This is a practical approach that aims to achieve the best results by combining currently available methods rather than waiting for a perfect standard to be finalized.

Collaboration Between NodeBB and Discourse

These two forum software platforms share forum-specific backfill mechanisms. Due to the nature of forums where conversations are structured and often long-lasting, context management utilizing topic and category concepts is particularly important.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Note",
  "context": "https://community.nodebb.org/topic/18844",
  "audience": "https://community.nodebb.org/category/development",
  "tag": [
    {
      "type": "Link",
      "href": "https://meta.discourse.org/t/activitypub-support/12345",
      "rel": "related"
    }
  ]
}

Compatibility Considerations with Mastodon

Since Mastodon is the largest Fediverse platform, other implementations need to consider compatibility with Mastodon. In particular, many support Mastodon's ostatus:conversation concept.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "ostatus": "http://ostatus.org#",
      "conversation": "ostatus:conversation"
    }
  ],
  "type": "Note",
  "content": "Mastodon compatible reply",
  "context": "https://mastodon.social/contexts/abc123",
  "conversation": "tag:mastodon.social,2025:objectId=12345:objectType=Conversation"
}

Maintaining such backward compatibility plays an important role in preventing fragmentation of the Fediverse ecosystem and improving user experience.

Future Development Direction: Standardization of Hybrid Approaches

In the future, rather than finding a single "correct answer," a standardized approach that systematically combines multiple methods is likely to emerge. This is a best-of-both-worlds approach that leverages the strengths of each method while compensating for their weaknesses.

Best Practice Guidelines

  1. Implement Multiple Strategies: Never rely on just one backfill method. Considering the diversity and uncertainty of the Fediverse, combining multiple strategies is essential. Each strategy shows strengths in different situations, so you should secure the flexibility to choose appropriate strategies according to the situation.

    For example, the context owner approach might be effective in active forum discussions, but reply tree crawling might be more suitable for typical conversations on Mastodon.

  2. Resource Management: Backfill operations can consume significant server resources. Especially for popular conversations or large-scale discussions, hundreds of network requests may be needed. Therefore, appropriate limitations and regulation mechanisms should be implemented.

  3. Monitoring and Logging: It's important to continuously monitor the performance and reliability of the backfill system. You should track which methods are most effective, what kinds of errors occur frequently, etc.

Conclusion

The "quiet Fediverse" problem is a fundamental challenge of decentralized social networks. The two main approaches examined in this article—reply tree crawling and the context owner approach—each have their own unique advantages and disadvantages.

Key Insights

There is no perfect solution. Both approaches show limitations in certain situations. Due to the inherent characteristics of distributed networks, achieving 100% perfect conversation recovery may be realistically difficult.

A hybrid approach is realistic. Most successful implementations use a combination of multiple backfill strategies. Having resilience where one method can compensate for another if it fails is important.

Standardization is in progress. Efforts to increase interoperability through the FEP process continue. However, rather than waiting for a complete standard, it's more realistic to pragmatically combine currently available methods.

User experience is key. While technical completeness is important, ultimately what matters is whether users can see complete conversations. Practical effects should take priority over technical elegance.

Future Direction

The conversation backfill problem in the Fediverse is not just a technical issue but a complex problem of governance, moderation, and user experience in decentralized networks.

In particular, the difference in moderation paradigms is a philosophical issue that goes beyond simple technical compatibility. Should the context owner be able to control the entire conversation, or should each reply author be able to moderate independently? These questions connect to fundamental considerations about what kind of social space the Fediverse should be.

2025 appears to be the year when solutions to these problems will be fully deployed and tested. Through the continued interest and participation of developers and users, the Fediverse can develop into a richer and more connected social network.

What's important is improvement rather than perfection. Even if the current "quiet Fediverse" problem isn't completely solved, if these efforts allow users to experience more complete conversations, that alone can be considered meaningful progress.


  1. FEP stands for Fediverse Enhancement Proposal, which is an official document system for proposing and discussing improvements to the Fediverse. It is used in the process of standardizing new features or protocol extensions. ↩︎

7
0
0

No comments

If you have a fediverse account, you can comment on this article from your own instance. Search https://hackers.pub/ap/articles/0197534d-e9d7-7663-8aa4-9184be27254d on your instance and reply to it.