-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Description
Hi! Thank you for sharing your implementation of C4.5 (over 10 years ago!!)
I just wanted to point out a simple fix that can improve by order of magnitudes the runtime of the algorithm, in some datasets. I hope this will be useful for people like me that will find this repo in the future.
return {k: [v[i] for i in range(len(v)) if i in ind] for k, v in t.items()}
Here, it is useless to cycle over range and ind. You can simplify as follows:
return {k: [v[i] for i in ind] for k, v in t.items()}
On my machine, running the updated version on the mushroom UCI dataset takes <1s, while before it took about 50 seconds.
I plan to release a repository that will include your updated code. (hoping that's ok!)
Thanks!
Metadata
Metadata
Assignees
Labels
No labels