-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Description
When running the RD-Agent (Fin-Factor / Qlib) demo, I observed a limitation in how factors can be reused across iterations.
In short:
Newly generated factors (e.g. MOM20) can be evaluated and referenced conceptually in later loops, but they cannot be used as base columns when constructing new factors.
For example, I cannot writedf["$MOM20"]in the next factor’s implementation, and I seem to always have to start from raw columns likedf["$close"]again.
This also seems to apply to the baseline features defined in the YAML (e.g. the Alpha158-like feature block). They are available for modeling, but not directly usable as “building blocks” when constructing new factors in later iterations.
Context / Example
In my Qlib config, I have a baseline feature block like this (simplified):
feature:
- ["Resi($close, 5)/$close", "Std(Abs($close/Ref($close, 1)-1)*$volume, 5)/(Mean(Abs($close/Ref($close, 1)-1)*$volume, 5)+1e-12)",
"Rsquare($close, 5)", "($high-$low)/$open", "Rsquare($close, 10)", "Corr($close, Log($volume+1), 5)",
"Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 5)", "Corr($close, Log($volume+1), 10)",
"Ref($close, 60)/$close", "Resi($close, 10)/$close", "Std($volume, 5)/($volume+1e-12)",
"Rsquare($close, 60)", "Corr($close, Log($volume+1), 60)", "Std(Abs($close/Ref($close, 1)-1)*$volume, 60)/(Mean(Abs($close/Ref($close, 1)-1)*$volume, 60)+1e-12)",
"Std($close, 5)/$close", "Rsquare($close, 20)", "Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 60)",
"Corr($close/Ref($close, 1), Log($volume/Ref($volume, 1)+1), 10)", "Corr($close, Log($volume+1), 20)",
"(Less($open, $close)-$low)/$open",
"($close/Ref($close, 5)-1)", "($close/Ref($close, 10)-1)", "($close/Ref($close, 20)-1)",
"($volume/Ref($volume, 5)-1)"]
- ["RESI5", "WVMA5", "RSQR5", "KLEN", "RSQR10", "CORR5", "CORD5", "CORR10",
"ROC60", "RESI10", "VSTD5", "RSQR60", "CORR60", "WVMA60", "STD5",
"RSQR20", "CORD60", "CORD10", "CORR20", "KLOW",
"PMOM5", "PMOM10", "PMOM20",
"VMOM5"]These features (e.g. RESI5, KLEN) exist conceptually as factors, and I can see them in the data / factor outputs.
However, when RD-Agent generates new factor code, I cannot reliably write something like:
df["$NEW_FACTOR"] = df["$RESI5"] * df["$KLEN"]Instead, the new factor still has to be formulated from the raw columns (e.g. $close, $volume, etc.), rather than reusing $PMOM20, $MOM20, or other already-defined factor columns.
My Hypothesis
My understanding (according to rdagent/components/coder/factor_coder/factor.py) of the pipeline is:
daily_pv.h5is passed to Qlib, which can generate the baseline factors (and possibly previously discovered factors) correctly.- RD-Agent adds an extra validation layer before handing data to Qlib / or before committing new factors into the “official” feature space.
- During this validation phase, if a new factor references another generated factor (e.g.
df["$MOM20"]), the validation fails because that column is not yet recognized / registered at that stage.
As a result, factors that depend on other factors fail in validation, and thus never enter the official factor set.
This creates a “bad loop”:
- In actual running (once the factor is fully integrated into Qlib), the factor can exist and be used by models;
- But during RD-Agent’s factor validation / generation stage, reusing that factor as a base column is not allowed, so new “factor-of-factor” designs are effectively blocked.
Why this matters
Practically, this limits the ability to “build with Lego blocks”:
- I would like to stack / compose factors across loops:
e.g. build regime-adjusted factors, interaction terms, non-linear transforms on top of already discovered good factors (MOM20,carry,seasonality, etc.). - If every new factor must always start from raw
$close,$volume, etc., then the search space for higher-order, domain-specific factors is much more constrained.
Questions
-
Is my understanding of this behavior correct?
-
If yes, is this a deliberate design choice, or just a current limitation of the implementation?
-
Is there a recommended way to:
- either register previously generated factors as official Qlib feature columns that can be referenced in later factor code; or
- modify the pipeline so that factor validation is aware of previously generated factor columns?
-
Would you consider supporting “factor-of-factor” construction (reusing validated factors as building blocks) as a future feature for RD-Agent-Quant?
Environment
I think this question is irrelative with environment.