Question: Can RD-Agent reuse previously generated factors (e.g. MOM20) as inputs when constructing new factors?

### Description

When running the RD-Agent (Fin-Factor / Qlib) demo, I observed a limitation in how factors can be reused across iterations.

In short:

> Newly generated factors (e.g. MOM20) can be evaluated and referenced conceptually in later loops, but they cannot be used as *base columns* when constructing new factors.
> For example, I cannot write `df["$MOM20"]` in the next factor’s implementation, and I seem to always have to start from raw columns like `df["$close"]` again.

> This also seems to apply to the baseline features defined in the YAML (e.g. the Alpha158-like feature block). They are available for modeling, but not directly usable as “building blocks” when constructing new factors in later iterations.

---

### Context / Example

In my Qlib config, I have a baseline feature block like this (simplified):

```yaml
feature:
  - ["Resi($close, 5)/$close", "Std(Abs($close/Ref($close, 1)-1)*$volume, 5)/(Mean(Abs($close/Ref($close, 1)-1)*$volume, 5)+1e-12)",
     "Rsquare($close, 5)", "($high-$low)/$open", "Rsquare($close, 10)", "Corr($close, Log($volume+1), 5)",
     "Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 5)", "Corr($close, Log($volume+1), 10)",
     "Ref($close, 60)/$close", "Resi($close, 10)/$close", "Std($volume, 5)/($volume+1e-12)",
     "Rsquare($close, 60)", "Corr($close, Log($volume+1), 60)", "Std(Abs($close/Ref($close, 1)-1)*$volume, 60)/(Mean(Abs($close/Ref($close, 1)-1)*$volume, 60)+1e-12)",
     "Std($close, 5)/$close", "Rsquare($close, 20)", "Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 60)",
     "Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 10)", "Corr($close, Log($volume+1), 20)",
     "(Less($open, $close)-$low)/$open",
     "($close/Ref($close, 5)-1)",   "($close/Ref($close, 10)-1)",  "($close/Ref($close, 20)-1)",
     "($volume/Ref($volume, 5)-1)"]
  - ["RESI5", "WVMA5", "RSQR5", "KLEN", "RSQR10", "CORR5", "CORD5", "CORR10", 
     "ROC60", "RESI10", "VSTD5", "RSQR60", "CORR60", "WVMA60", "STD5", 
     "RSQR20", "CORD60", "CORD10", "CORR20", "KLOW",
     "PMOM5", "PMOM10", "PMOM20",
     "VMOM5"]
```

These features (e.g. `RESI5`, `KLEN`) **exist conceptually as factors**, and I can see them in the data / factor outputs.
However, when RD-Agent generates new factor code, I cannot reliably write something like:

```python
df["$NEW_FACTOR"] = df["$RESI5"] * df["$KLEN"]
```

Instead, the new factor still has to be formulated from the raw columns (e.g. `$close`, `$volume`, etc.), rather than reusing `$PMOM20`, `$MOM20`, or other already-defined factor columns.

---

### My Hypothesis

My understanding (according to `rdagent/components/coder/factor_coder/factor.py`) of the pipeline is:

1. `daily_pv.h5` is passed to Qlib, which can generate the baseline factors (and possibly previously discovered factors) correctly.
2. RD-Agent adds an extra validation layer before handing data to Qlib / or before committing new factors into the “official” feature space.
3. During this validation phase, if a new factor references another generated factor (e.g. `df["$MOM20"]`), the validation fails because that column is not yet recognized / registered at that stage.
   As a result, factors that depend on other factors fail in validation, and thus never enter the official factor set.

This creates a “bad loop”:

* In actual running (once the factor is fully integrated into Qlib), the factor can exist and be used by models;
* But during RD-Agent’s factor validation / generation stage, reusing that factor as a base column is not allowed, so new “factor-of-factor” designs are effectively blocked.

---

### Why this matters

Practically, this limits the ability to “build with Lego blocks”:

* I would like to **stack / compose** factors across loops:
  e.g. build regime-adjusted factors, interaction terms, non-linear transforms on top of already discovered good factors (`MOM20`, `carry`, `seasonality`, etc.).
* If every new factor must always start from raw `$close`, `$volume`, etc., then the search space for higher-order, domain-specific factors is much more constrained.

---

### Questions

1. Is my understanding of this behavior correct?
2. If yes, is this a deliberate design choice, or just a current limitation of the implementation?
3. Is there a recommended way to:

   * either register previously generated factors as *official* Qlib feature columns that can be referenced in later factor code; or
   * modify the pipeline so that factor validation is aware of previously generated factor columns?
4. Would you consider supporting “factor-of-factor” construction (reusing validated factors as building blocks) as a future feature for RD-Agent-Quant?

---

### Environment
I think this question is irrelative with environment.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question: Can RD-Agent reuse previously generated factors (e.g. MOM20) as inputs when constructing new factors? #1310

Description

Context / Example

My Hypothesis

Why this matters

Questions

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question: Can RD-Agent reuse previously generated factors (e.g. MOM20) as inputs when constructing new factors? #1310

Description

Description

Context / Example

My Hypothesis

Why this matters

Questions

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions