Comparing changes

BP from MB-68572 When alter index is performed on an index with alternate shardIds, the replica alternate shardIds are formed based on the alternate shardIds in the defn passed to replica repair. E.g., if replica repair is formed based on the definition containing partitions 1,2 - then for the definition containing partitions 3,4 alternate shardIds for partns: 1, 2 will be used. This will lead to the partitions being built with empty alternate shardIds. Change-Id: I2aa446632fbcf88cc94c0615c3c85e399a8e42f4

…sation on error BP MB-68943 Currently if any error is encountered while executing the function inside the sync.Once primitive, the primitive is marked as Done even though the function inside fails. So when the secuirty context for client is being initialised if any error occurs, the security context is Not Set and It is never tried again due to sync.Once being marked as done. fix: introduced a new type which is very similar to sync.Once but with a key difference that it is aware of the result (error) from the function inside of it, to mark itself done only when it succeeds. For any error it is not marked done and can be invoked again. Change-Id: I12bf1395187515edb65917bd4e7fedfb1797eeac

…d during Merge BP MB-68558 During startShardRecovery, when the indexes are Recovered, and the function is waiting for Index state as ACTIVE. During the default loop, all the instances are iterated through, if the instances reach ACTIVE state but at the time of finishSuccess() the instId is already deleted from the processedInsts list, there is no guarantee that the merge of destTokenToMergeOrReady went through. This can lead to the TT to be transitioned to the next state even though merge won't happen. If it transitions, the cleanup will not be done by the Destination node To prevent the Transfer token from moving to the next state, errgroup is used to capture any errors returned by the merge observer go-routines. Presence of error in any one of the go-routine signifies that either rebalance was cancelled or done. The errorgroup waits for all the go-routine to be done as well checks for any returned error. For any returned errors, don't proceed further Change-Id: Ie13a85a4096a025a1e34c5770a24f46801a494d9

BP MB-68906 Currently any response/error returned by the storage engine is not logged by GSI. This means that certain information about the book keeping by storage layers can be missed in the logs for debugging Change-Id: I741dc074a57a586043f22f0157256d3cd6d2e4ee

BP from MB-69229 During rebalance, Bhive proxies were merging into their real instances without copying the in-memory BhiveGraphStatus, and pruned partitions left behind stale entries. This led to mismatches where checkDDLInProgress would see stale or missing graph status. Changes: copy each partition’s graph-ready flag from the source proxy when merging, and delete the entry when pruning, so the in-memory map always matches the local partition set. Change-Id: I79e04944a46950aefbbcc857d9e13ba8fd858fdb

BP from MB-68576 Since Bhive graph build is a CPU intense task, it can lead to CPU saturation when done in conjunction with rebalance. For this, rebalancer already makes rebalance wait till Bhice graph build is completed. Instead of rebalance being stuck for long time (several hours for large indexes), it is better to reject rebalance by considering it as DDL in progress. Since there isnt a instance-wide GraphBuildPhase like there is a TrainingPhase, I did think of introducing that as it wouldve made this check cleaner and could be used elsewhere. But didnt want to add a new field and all transitions for just this one thing. Change-Id: If115aefa84f2040f417c1889e792edaaf71aa595

Toggling this config requires indexer restart to apply it to existing indexes since skiplist node layout cannot be changed easily at runtime. Change-Id: Ib7a9bce8967b6f0298685159da99ca0a8326d7b4 (cherry picked from commit 7a60aa9)

* Backport of MB-65760 * Projector is timing out around 15 minutes after indexer failover as dataport client held a stale TCP connection. When the indexer node failed over or restarted uncleanly, no FIN/RST was sent, leaving the connection half-open. * This can happend when nodes come back with new IP after failover * The client continued to use the old TCP stream until the OS TCP stack exhausted its retransmission attempts. This caused delayed error detection and prolonged recovery. * This change adds proper detection and cleanup of half-open TCP connections so projector reconnects immediately after failover or server restart. Change-Id: Ice7ea750636ea59b4b49a3d98fec0a3e6e25b50f (cherry picked from commit 84b3963)

The populateMetrics() function was outputting Prometheus metrics without the required #TYPE declarations,causing inconsistency with handleMetrics() which correctly outputs both the metric type and value. fix: added # TYPE declarations to the populateMetrics() and populateIsDivergingReplicaStat() functions Change-Id: I62a1d30886a0f305b5cd36dca48485c10115db91

Backport of MB-64242 This change allows timekeeper to slow down the MAINT_STREAM vs INIT_STREAM and facilitate the stream merge. Generally, it is expected that MAINT_STREAM runs slower than the INIT_STREAM during catchup phase, as it is processing more number of indexes. But it is possible with collection based data modelling that MAINT_STREAM is handling indexes of a collection with very low workload, while INIT_STREAM has index(es) with high workload. During catchup phase, INIT_STREAM may not be able to catchup in such a case. This patch adds the ability for timekeeper to identify long running stream merges and slow down the MAINT_STREAM. As MAINT_STREAM actively serves scans, slowing down the stream will add to scan latency. So this action is taken incrementally and up to a configurable max interval. New configs: timekeeper.mergePhase.maxTimerInterval (default: 100ms, 0 disables) timekeeper.mergePhase.tsQueueThreshold (default: 500) timekeeper.maintStream.forcedDelay (default: 0) The last config allows to set external throttling of MAINT_STREAM in case automatic throttling is not sufficient. Change-Id: I71bca2810de85277516d9c4e229f1f5318f3ad80

…d codebooks BP from MB-68640 During shard-based rebalance, vector index codebooks are transferred along with index data. The destination node recovers the codebook from disk. However, the training flow still fetched sample vectors from KV even though they were never used - wasting resources. Changes: - Add instHasTrainedCodebook() to check if any slice has a trained codebook using slice.IsTrained() (avoids memory allocation) - Skip sampling when defn-level reuse or a trained slice can supply a codebook; only call FetchSampleVectorsForIndexes for instances that actually need it - Fix: Set successMap to nil on early return to avoid duplicate reporting in retry scenarios Cases handled: - Rebalance/Resume: trained slice on this instance, its real inst, or other instances of the defn lets all defn instances skip sampling - ALTER INDEX: trained active replicas of the defn, or trained slices on this instance, let the defn skip sampling - Mixed batch: if sampling fails, codebook-backed instances continue; if all skip sampling, no KV fetch occurs Change-Id: I8ca06178d3b4d718525cc52850e0b5657a89d925

… training phase BP from MB-68544 If source or target index instances are in training phase, the reject the merge. Indexer will trigger merge after training is done for those instances (if the keyspace is idle, then indexer needs to force a merge after training is done to properly merge the erroneous instances). After training, if source and target do not agree on the same training state, then fail rebalance. This can happen in the following case: 1. Real instance got build request and training initiated 2. A proxy (p1) moved to the node and it started build 3. Real instance did not find any documents and marked for training error 4. While proxy p1 training is in progress, another proxy p2 started training where both realInst and p2 have started to train 5. p1 moves to TRAINING_NOT_STARTED but the training for realInst and p2 succeeds now. So, they will be moved to TRAINING_COMPLETED state 6. In this case, we can not merge realInst and proxy (p1) and rebalance would fail. Although this is a rare race condition, it is better to handle it. A retry of rebalance should fix the issue as the documents are added between in the middle of training different proxies. Change-Id: I01c704a9eb0f2431304d426536dab84fc8ca941a

[BP to 8.0.1] bg: the expectation with restoreShardDone is that it will called post recovery of all indexes; this signals plasma that the required indexes are done with restore and it can proceed with cleaning up dead instances, starting LSS cleaners etc; indexes whose shards are not called with restoreShardDone should not be used if indexer crashes and bootstap recovery happens primarily because it is expected that such indexes are not rebal active and will be cleaned up; this is not true for non-empty node batched indexes leading to the bug of shard corruption exp: to fix the symantic issues in calling the restoreShardDone API call it once recovery of all indexes is complete before we transition to the next state; this gurantees that either the indexes which could be recovered post a crash are from a shard which was marked done or those indexes will get deleted asmpt: plasma assumes that GSI will not recover indexes whose shards are not called with restoreShardDone before crashes; Change-Id: I1b75a34e62ce529d407faee301a2cb12cdcbc873 Signed-off-by: Dhruvil Shah <dsdruvil8@gmail.com>

[BP to 8.0.1] bg: as described in the ticket, we should not be calling restoreShardDone out of rebalance context; the last call which remains for the same is RestoreAndUnlockShard which is called during rebalance cleanup to restore shards which are locked for recovery and pending ready; it can no longer happen we have shards which are ready/pending ready with restoreShardDone not called; all shards not having undergone restoreShardDone are expected to be dropped; exp: restoreShardDone is not required to be called in cleanup; we can have shards locked for recovery but we cannot have shards which have not undergone `restoreShardDone`; hence only call unlock shards for such cases; tests: existing functional test in CI `TestVectorIndexShardRebalance/TestRebalanceCancelIndexerAfterRecovery` already tests for the behaviour we are aiming for; Change-Id: I366942977417c5f58d80d8e2bfc1b43a15bae3fe Signed-off-by: Dhruvil Shah <dsdruvil8@gmail.com>

Whenever shard manager of bhive is not initialised (this can be due to no Bhive index has been created on node) an error log will be printed. fix: For the storage not initialised error, print a warning and suprress the error when returned to the storage manager. Suppressing the error should not any side effect, since the bhiveShards slice returned would be nil. Change-Id: I46d29d66a402aed57086cd76992221959e14ab75 (cherry picked from commit 7366285)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Uh oh!

Commits on Nov 19, 2025

Commits on Nov 21, 2025

Commits on Nov 25, 2025

Commits on Dec 11, 2025

Commits on Dec 22, 2025

Commits on Dec 24, 2025

Commits on Dec 25, 2025

Commits on Jan 8, 2026

Commits on Jan 12, 2026

Commits on Jan 14, 2026

Commits on Jan 21, 2026

This comparison is taking too long to generate.

Uh oh!