-
Notifications
You must be signed in to change notification settings - Fork 1
ComponentCollection #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Code coverage is not 100%, but I think all the essentials are there, and I'd rather get feedback on the important things than polish tests that may need to change anyway :) Edit: apparently codecov is upset about the missing tests, so I added tests and discovered a minor bug in the process. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #56 +/- ##
===========================================
+ Coverage 96.60% 97.67% +1.06%
===========================================
Files 12 13 +1
Lines 412 644 +232
===========================================
+ Hits 398 629 +231
- Misses 14 15 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
rozyczko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part 1 after a quick look
| Numeric = Union[float, int] | ||
|
|
||
|
|
||
| class SampleModel(ObjBase, MutableMapping): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you inherit from TheoreticalModelBase or at least from BaseObjectCollection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably? I'm not exactly sure
| params = [] | ||
| for comp in self.components.values(): | ||
| params.extend(comp.get_parameters()) | ||
| return params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just
from itertools import chain
# Create generator for temperature parameter
temp_params = (self._temperature,) if self._temperature is not None else ()
# Create generator for component parameters
comp_params = (param for comp in self.components.values()
for param in comp.get_parameters())
# Chain them together and return as list
return list(chain(temp_params, comp_params))(slight teaching aspect for itertools - a VERY important python module)
| if is_not_fixed and is_independent: | ||
| fit_parameters.append(parameter) | ||
|
|
||
| return fit_parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not a bad loop (easy to see what's being done) but if you want to make it more "modular" you could use list comprehension, e.g.
def is_fit_parameter(param: Parameter) -> bool:
"""Check if a parameter can be used for fitting."""
return (not getattr(param, "fixed", False) and
getattr(param, "_independent", True))
return [param for param in self.get_parameters() if is_fit_parameter(param)]|
@rozyczko Would you have another look when you have time? I'm working on other things the rest of the day and am away tomorrow, so no rush. |
rozyczko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments after pondering on the file for too long ;)
| """ | ||
|
|
||
| CollectionBase.__init__(self, name=name) | ||
| TheoreticalModelBase.__init__(self, name=name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be just
super().__init__(name=name)to properly handle multiple inheritances. Otherwise it violates MRO best practices and makes maintenance more difficult (by being too explicit)
| ------- | ||
| List[ModelComponent] | ||
| """ | ||
| return list(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class mixes direct list access (list(self)) with a components property. Pick one approach:
- Either make
componentsthe primary interface and removelist(self)calls - Or fully embrace the collection interface and remove the
componentsproperty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will embrace the collection
| Name to assign to the component. If None, uses the component's own name. | ||
| """ | ||
| if not isinstance(component, ModelComponent): | ||
| raise TypeError("component must be an instance of ModelComponent.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe uppercase Component since it is the start of the sentence?
|
|
||
| # Add initial components if provided. Mostly used for serialization. | ||
| if data: | ||
| # Just to be safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment adds no value. Either explain WHY it's needed or remove it, please.
|
|
||
|
|
||
| class SampleModel(CollectionBase, TheoreticalModelBase): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be useful to add __len__ so querying the model is simpler
>>> model = SampleModel()
>>> model.add_component(Gaussian(name="g1"))
>>> len(model)To do this, just query super():
def __len__(self) -> int:
return super().__len__()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already works! :)
| self.insert(index=len(self), value=component) | ||
|
|
||
| def remove_component(self, name: str): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why find all indices if you only delete the first? This is either inefficient or you expect duplicate names?
Just delete the first index found.
| List[ModelComponent] | ||
| """ | ||
| return list(self) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this creation of a new list every time components() is accessed is pretty wasteful. Either cache it or use list(self) directly as said earlier.
| ) # noqa: E501 | ||
|
|
||
| def convert_unit(self, unit: Union[str, sc.Unit]) -> None: | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding the rollback mechanism, which you did in ModelComponent.
| if not self.components: | ||
| raise ValueError("No components in the model to evaluate.") | ||
| result = None | ||
| for component in list(self): | ||
| value = component.evaluate(x) | ||
| result = value if result is None else result + value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just
if not self.components:
raise ValueError("No components in the model to evaluate.")
return sum(component.evaluate(x) for component in self)|
|
||
| class SampleModel(CollectionBase, TheoreticalModelBase): | ||
| """ | ||
| A model of the scattering from a sample, combining multiple model components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, you could extend the API to include __contains__ so we can do:
>>> model = SampleModel()
>>> gaussian = Gaussian(name="g1")
>>> model.add_component(gaussian)
>>> "g1" in model
True
>>> gaussian in model
True
>>> "nonexistent" in model
Falsewith something like
def __contains__(self, item: Union[str, ModelComponent]) -> bool:
if isinstance(item, str):
# Check by component name
return any(comp.name == item for comp in self)
elif isinstance(item, ModelComponent):
# Check by component instance
return any(comp is item for comp in self)
else:
return False|
As always, very useful comments :) here's the updated version |
damskii9992
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First and by far the biggest comment.
Do you WANT your SampleModel to BE a list and dictionary? Or do you want it to contain a list/dictionary?
It should only be the former if you REALLY want to be able to iterate over your model, slice it to get components, or do len(sample_model) instead of, for example len(sample_model.components).
I don't really think you want it to be a list/dictionary . . .
And even if you did, you should probably write up your own collections class, as relying on CollectionBase is dangerous as it is subject to be removed (or at least heavily modified).
Well, I honestly don't care too much exactly what it is. I want to have a collection of components that I/the user can easily access and work with, I want it to be integrated with the rest of corelib, i.e. reuse as much of corelib as possible, which also helps minimizing the amount of code I have to write/maintain in dynamics-lib. Would you recommend the components to be stored as a dict or list? |
|
@damskii9992 Could I have a quick sanity check of the changes to ModelComponent and Gaussian to use |
| self, | ||
| name="ModelComponent", | ||
| display_name="ModelComponent", | ||
| unit: Optional[Union[str, sc.Unit]] = "meV", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use Optional or Union for type hinting, from Python 3.10 and forward you simply write str | sc.Unit
Also, Optional essentially means it can also be a None, so it shouldn't even have been used here.
| def free_all_parameters(self): | ||
| """Free all parameters in the model component.""" | ||
| for p in self.get_parameters(): | ||
| for p in self.get_all_parameters(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and above should be get_fittable_parameters since you can't set dependent Parameters fixed attribute.
|
|
||
| old_unit = self._unit | ||
| pars = self.get_parameters() | ||
| pars = self.get_all_parameters() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be get_all_variables so you also convert DescriptorNumbers? You had a Plancks constant descriptor somewhere right? Shouldn't that one also be converted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think so, since Planck's constant can't be converted to e.g. microeV. So I'll leave it as it is for now, and figure out the units issue when I get to the DiffusionModel
| def __init__( | ||
| self, | ||
| name="ModelComponent", | ||
| display_name="ModelComponent", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also remove the default display_name, it is not a mandatory attribute :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, but I like names... I'll probably remove it
|
@damskii9992 @rozyczko all ready for another review when you have time :) |
rozyczko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One important bug needs to be addressed.
The rest is for consideration
| self, | ||
| display_name: str = "MyComponentCollection", | ||
| unit: str | sc.Unit = "meV", | ||
| components: List[ModelComponent] = [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my... Mutable Default Argument bug. Using [] as a default argument is a well-known Python anti-pattern.
Python’s default arguments are evaluated once when the function is defined, not each time the function is called
Use None and create the list inside the function:
components: List[ModelComponent] | None = None,
then
components = components or []
Check out this example:
def append_to(element, to=[]):
to.append(element)
return to
then call it twice with different args
my_list = append_to(12)
print(my_list)
my_other_list = append_to(42)
print(my_other_list)
What did you expect in the second case? [42]? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this madness?!
| """ | ||
| Gaussian function: area/(width*sqrt(2pi)) * exp(-0.5*((x - center)/width)^2) | ||
| If the center is not provided, it will be centered at 0 and fixed, which is typically what you want in QENS. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your component functions are missing get_all_variables Implementation, relying on the parent class implementation.
The parent ModelBase.get_all_variables() may not find _area, _center, _width since they're not passed to super().init() anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am indeed relying on the parent class implementation, but it works :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the new ModelBase classs' get_all_variables method automatically finds the parameters of of the class if they are defined as properties which return a Parameter :)
| super().__init__( | ||
| display_name=display_name, | ||
| unit=unit, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gaussian calls super().__init__() first (which relies on self._unit), but _unit is set inside ModelComponent.__init__ via validate_unit. However, other components (Lorentzian, DHO) set self._unit = unit before calling super. This is inconsistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, the Gaussian does it right, setting self._unit = unit before calling super is redundant. Will fix.
| ) | ||
|
|
||
| def get_parameters(self) -> list[Parameter]: | ||
| def get_all_variables(self) -> list[DescriptorBase]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overrides get_all_variables properly but other components use get_all_parameters. The naming is inconsistent across the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is because of how the coefficients of the polynomial are created and stored. ModelBase does not look through lists of Parameters, so I have to do it manually. All the other get_all_ from ModelBase rely on get_all_variables, so this was the only one that needed to be overridden.
It may make more sense to update ModelBase to also look through lists to check for Parameters (and Descriptors)
| Parameters | ||
| ---------- | ||
| name : str | ||
| Name of the sample model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Says name but param is display_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, thought I had gotten them all
| @@ -0,0 +1,296 @@ | |||
| import warnings | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code mixes old-style Union[X, Y] with new-style X | Y:
(old)
Union[Numeric, list, np.ndarray, sc.Variable, sc.DataArray]
(new)
Numeric | list | np.ndarray | sc.Variable | sc.DataArray
Pick one style for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to consistently use the new style, but clearly I'm not good at consistency. Should be better now
| @@ -0,0 +1,372 @@ | |||
| from copy import copy | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is part of the EasyScience ecosystem which relies heavily on as_dict()/from_dict() serialization, there are no tests for serializing/deserializing ComponentCollection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Copilot was happy to write those tests :)
damskii9992
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not reviewed the unit tests yet, but I feel like there is plenty here to look at first.
| name="ModelComponent", | ||
| unit: Optional[Union[str, sc.Unit]] = "meV", | ||
| **kwargs: Any, | ||
| display_name: str = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typehint should be str | None
| ): | ||
| self.validate_unit(unit) | ||
| super().__init__(name=name, **kwargs) | ||
| super().__init__(display_name=display_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No custom unique_name? Is that on purpose or did you just forget it?
| name: Optional[str] = "Polynomial", | ||
| coefficients: Optional[Sequence[Union[Numeric, Parameter]]] = (0.0,), | ||
| unit: Union[str, sc.Unit] = "meV", | ||
| display_name: str = "Polynomial", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this missing a typehint of None?
| unit: str | sc.Unit = "meV", | ||
| ): | ||
| self.validate_unit(unit) | ||
| super().__init__(display_name=display_name, unit=unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also (possibly) missing a custom unique_name? I won't write it on the rest of the places.
| param = Parameter(name=f"{display_name}_c{i}", value=float(coef)) | ||
| else: | ||
| raise TypeError( | ||
| "Each coefficient must be either a numeric value or a Parameter." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're allowing the user to pass Parameters? I thought we decided against that . . .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I needed it to make serialisation work with the old base...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I'll keep it for now, not use it unless I feel like I really need to, then remove it once SampleModel exists :)
| for component in self.components: | ||
| if hasattr(component, "area"): | ||
| area_params.append(component.area) | ||
| total_area += component.area.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to check their units? Remember that we can add Parameters together. You might want to use that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll do that
| raise ValueError("Total area is zero; cannot normalize.") | ||
|
|
||
| if not np.isfinite(total_area): | ||
| raise ValueError("Total area is not finite; cannot normalize.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this ever happen? We should not allow Parameters to have infinite values and then this could never be infinite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also accounts for NaN's. Either way I think it's fine to have a check even if it will never be used if things go as we expect :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary checks does make code slower, but only by a little. Just wanted to point it out in case you hadn't realized ^^
|
|
||
| if not self.components: | ||
| raise ValueError("No components in the model to evaluate.") | ||
| return sum(component.evaluate(x) for component in self.components) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this even work? You are trying to sum the contributions element-wise, right? So if x is an array, you would be trying to use sum on a tuple of arrays/lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Fix all free parameters in the model. | ||
| """ | ||
| for param in self.get_all_parameters(): | ||
| param.fixed = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As previously mentioned, this and below should be get_fittable_parameters as you can't set the fixed attribute on dependent Parameters.
|
|
||
| if isinstance(item, str): | ||
| # Check by component name | ||
| return any(comp.display_name == item for comp in self.components) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all of this class, you should really think about where you use display_name and where you use unique_name.
display_name is ONLY meant for displaying, any and all logic should be using the unique_name.
Adding SampleModel which can contain ModelComponents and calculate the model at given x, with and without detailed balance.