Skip to content

Question: Can RD-Agent reuse previously generated factors (e.g. MOM20) as inputs when constructing new factors? #1310

@lajiman

Description

@lajiman

Description

When running the RD-Agent (Fin-Factor / Qlib) demo, I observed a limitation in how factors can be reused across iterations.

In short:

Newly generated factors (e.g. MOM20) can be evaluated and referenced conceptually in later loops, but they cannot be used as base columns when constructing new factors.
For example, I cannot write df["$MOM20"] in the next factor’s implementation, and I seem to always have to start from raw columns like df["$close"] again.

This also seems to apply to the baseline features defined in the YAML (e.g. the Alpha158-like feature block). They are available for modeling, but not directly usable as “building blocks” when constructing new factors in later iterations.


Context / Example

In my Qlib config, I have a baseline feature block like this (simplified):

feature:
  - ["Resi($close, 5)/$close", "Std(Abs($close/Ref($close, 1)-1)*$volume, 5)/(Mean(Abs($close/Ref($close, 1)-1)*$volume, 5)+1e-12)",
     "Rsquare($close, 5)", "($high-$low)/$open", "Rsquare($close, 10)", "Corr($close, Log($volume+1), 5)",
     "Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 5)", "Corr($close, Log($volume+1), 10)",
     "Ref($close, 60)/$close", "Resi($close, 10)/$close", "Std($volume, 5)/($volume+1e-12)",
     "Rsquare($close, 60)", "Corr($close, Log($volume+1), 60)", "Std(Abs($close/Ref($close, 1)-1)*$volume, 60)/(Mean(Abs($close/Ref($close, 1)-1)*$volume, 60)+1e-12)",
     "Std($close, 5)/$close", "Rsquare($close, 20)", "Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 60)",
     "Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 10)", "Corr($close, Log($volume+1), 20)",
     "(Less($open, $close)-$low)/$open",
     "($close/Ref($close, 5)-1)",   "($close/Ref($close, 10)-1)",  "($close/Ref($close, 20)-1)",
     "($volume/Ref($volume, 5)-1)"]
  - ["RESI5", "WVMA5", "RSQR5", "KLEN", "RSQR10", "CORR5", "CORD5", "CORR10", 
     "ROC60", "RESI10", "VSTD5", "RSQR60", "CORR60", "WVMA60", "STD5", 
     "RSQR20", "CORD60", "CORD10", "CORR20", "KLOW",
     "PMOM5", "PMOM10", "PMOM20",
     "VMOM5"]

These features (e.g. RESI5, KLEN) exist conceptually as factors, and I can see them in the data / factor outputs.
However, when RD-Agent generates new factor code, I cannot reliably write something like:

df["$NEW_FACTOR"] = df["$RESI5"] * df["$KLEN"]

Instead, the new factor still has to be formulated from the raw columns (e.g. $close, $volume, etc.), rather than reusing $PMOM20, $MOM20, or other already-defined factor columns.


My Hypothesis

My understanding (according to rdagent/components/coder/factor_coder/factor.py) of the pipeline is:

  1. daily_pv.h5 is passed to Qlib, which can generate the baseline factors (and possibly previously discovered factors) correctly.
  2. RD-Agent adds an extra validation layer before handing data to Qlib / or before committing new factors into the “official” feature space.
  3. During this validation phase, if a new factor references another generated factor (e.g. df["$MOM20"]), the validation fails because that column is not yet recognized / registered at that stage.
    As a result, factors that depend on other factors fail in validation, and thus never enter the official factor set.

This creates a “bad loop”:

  • In actual running (once the factor is fully integrated into Qlib), the factor can exist and be used by models;
  • But during RD-Agent’s factor validation / generation stage, reusing that factor as a base column is not allowed, so new “factor-of-factor” designs are effectively blocked.

Why this matters

Practically, this limits the ability to “build with Lego blocks”:

  • I would like to stack / compose factors across loops:
    e.g. build regime-adjusted factors, interaction terms, non-linear transforms on top of already discovered good factors (MOM20, carry, seasonality, etc.).
  • If every new factor must always start from raw $close, $volume, etc., then the search space for higher-order, domain-specific factors is much more constrained.

Questions

  1. Is my understanding of this behavior correct?

  2. If yes, is this a deliberate design choice, or just a current limitation of the implementation?

  3. Is there a recommended way to:

    • either register previously generated factors as official Qlib feature columns that can be referenced in later factor code; or
    • modify the pipeline so that factor validation is aware of previously generated factor columns?
  4. Would you consider supporting “factor-of-factor” construction (reusing validated factors as building blocks) as a future feature for RD-Agent-Quant?


Environment

I think this question is irrelative with environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions