Skip to content

Conversation

@jcshepherd
Copy link

What changes were proposed in this pull request?

This is a rewrite of concepts/index.md in ratis-docs, which goes into substantially more detail about Ratis/Raft architecture and how applications integrate w/Ratis. At the time of creating this PR, I have not overwritten the existing documentation, because I suspect some review and editing will be needed, but the proposal would be to update the existing index.md before merging.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2388

How was this patch tested?

Documentation only.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcshepherd , thanks a lot for contributing this! I have reviewed down to the Snapshot section. Please see the comments so far. Will continue reviewing this.


Raft's safety guarantees depend on majority agreement within each group. The leader replicates
each operation to the followers in its group, and operations are committed when at least
(N/2 + 1) peers in that group acknowledge them. This means a group of 3 peers can tolerate 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use $\lfloor N/2 + 1 \rfloor$.

Comment on lines +76 to +78
A single cluster can host multiple independent Raft groups, each with its own leader election,
consistency and state replication. Groups typically consist of an odd number of peers (3, 5, or
7 are common) to ensure clear majority decisions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's combine the first sentence to the paragraph and move the second sentence to the next section, which talks about majority.

diff --git a/ratis-docs/src/site/markdown/concept/index-v2.md b/ratis-docs/src/site/markdown/concept/index-v2.md
index d51146863..bf1463c43 100644
--- a/ratis-docs/src/site/markdown/concept/index-v2.md
+++ b/ratis-docs/src/site/markdown/concept/index-v2.md
@@ -71,18 +71,20 @@ group is a logical consensus domain that runs across a specific subset of peers
-At any given time, one peer in a group acts as the "leader" while the others are "followers" or
+One of the peers in a group acts as the "leader" while the others are "followers" or
 "listeners". The leader handles all write requests and replicates operations to other peers in
 the group. Both leaders and followers can service read requests, with different consistency
-guarantees.
+guarantees. A single cluster can host multiple independent Raft groups, 
+each with its own leader election, consistency and state replication.
 
-A single cluster can host multiple independent Raft groups, each with its own leader election,
-consistency and state replication. Groups typically consist of an odd number of peers (3, 5, or
-7 are common) to ensure clear majority decisions.
 
 ### Majority-Based Decision-Making
 
 Raft's safety guarantees depend on majority agreement within each group. The leader replicates
 each operation to the followers in its group, and operations are committed when at least
-(N/2 + 1) peers in that group acknowledge them. This means a group of 3 peers can tolerate 1
+$\lfloor N/2 + 1 \rfloor$ peers in that group acknowledge them. 
+This means a group of 3 peers can tolerate 1
 failure, a group of five peers can tolerate 2 failures, and so on.
+Since a group of $N$ peers for an even $N$ can tolerate the same number of failures as
+a group of $(N-1)$ peers, groups typically consist of an odd number of peers (3, 5, or
+7 are common) to ensure clear majority decisions.
 
 This majority requirement affects both availability and performance. A group remains available as
 long as a majority of its peers are reachable and functioning. However, every transaction must


### Servers, Clusters, and Groups

A Raft server (also known as a "peer") is a single running instance of your application with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add also "member", i.e.

... (also known as a "peer" or a "member")


A Raft cluster is a physical collection of servers that can participate in consensus. A Raft
group is a logical consensus domain that runs across a specific subset of peers in the cluster.
At any given time, one peer in a group acts as the "leader" while the others are "followers" or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a group could temporarily have no leader or more than one leaders (old leader not yet timed out), let remove "At any given time, ", i.e.

One of the peers in a group acts as the "leader" ...

Comment on lines +127 to +128
The state machine is not a finite state machine with states and transitions. Instead, it's a
deterministic computation engine that processes a sequence of operations and maintains some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the first sentence since an application can implement its Raft state machine as a finite state machine.

A state machine is a deterministic computation engine ...

after the snapshot, bringing the peer up to the current state of the group.

During normal operation, the state machine continuously processes transactions as they're
committed by the Raft group, responds to leadership changes, and handles read-only queries. For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove "responds to leadership changes" since the state machine does not need to do anything.

### Designing Your State Machine

When designing your state machine, ensure your operations are deterministic and can be
efficiently serialized for replication. Operations must be idempotent, as Raft may occasionally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idempotent is not a requirement of state machine -- e.g. x = x+1 is a valid non-idempotent operation. Each transaction is applied exactly one time.

### Read Consistency Options

**Linearizable reads** provide the strongest consistency by going through the Raft protocol to
ensure you're reading the most up-to-date committed data. Use the client's `sendReadOnly` method,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendReadOnly may use Linearizable read or Leader read, depending on the conf raft.server.read.option.

data if the leader has been partitioned from the majority.

**Follower reads** provide eventual consistency by serving reads directly from followers using
their local state machine. Call `sendReadOnly(message, serverId)` with a specific follower's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When linearizable read is enabled, follower read using sendReadOnly(message, serverId) is also linearizable.

the snapshot data is loaded replacing any existing state, and the state machine resumes normal
operation by replaying any log entries that occurred after the snapshot.

Your state machine's `initialize` method is responsible for loading snapshots during startup by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is reinitialize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants