-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I have some doubts on the implementation of the EVM code, which are the following:
- Most importantly, the return types are neither standard nor intuitive:
Line 293 in 2ad7fa2
yield (
a) Instead of turning the function into a generator, you should rather return the values in a useful data type (see next point).
b) Instead of returning a (nested) 2-element tuple, I would rather return a (nested) dictionary with the first element of the tuple as key, and the second as value.
c) When returning from the inference function, the first element is always identical"probs":Line 340 in 2ad7fa2
yield ("probs", (batch_to_process, probs))
There is no reason to provide this as a result, it will be ignored in any function that calls this one.
I do understand why you are returning it this way -- to be able to call it in parallel processes -- but handling dictionaries in the calling function is at least as simple.
- Providing input parameters in an argument parser structure is very uncommon:
Line 133 in 2ad7fa2
args,
Any function that wants to use your functions will need to either take your provided arg parser, or replicate your parser structure. The more common way is to have a separate parameter with a proper default value. Then, you can collect your parameters in a dictionary and pass them to the function, such as:
params = dict(tailsize=0.1, distance_metric="euclidean")
results = EVM_Training(..., **params)
-
As far as I understood your code, the chunk size is based on the number samples, where we could have many per class. However, the concatenation of features is tested for equality with the chunk size:
Line 209 in 2ad7fa2
if len(temp) == args.chunk_size:
It might happen that I this test is never met, when we have more samples in the collection than we have defined as chunk size. -
For that the
Example.iynbshould serve as an example on how to use your functions, there are many parts that are very difficult to be understood: https://github.com/Vastlab/vast/blob/2ad7fa23cb8f0f1acb71d6b6d9aced6a70562e0c/vast/opensetAlgos/Example.ipynb
a) Cell 2 uses some pre-defined data that is downloaded from your web page, but it is explained nowhere what this data contains.
b) The line 2 in cell 3 contains a nestedlist, map, list, lambda, applyfunction that is impossible to follow, especially when we do not know how the data looks like.
c) Cell 10 processes the complicated (see my point 1 above) output of yourEVM_Trainingfunction using a combination ofdict, list, zip, *, [1]which does not help in any way to understand how to handle the results of the training.
d) Cell 11 contains acat, list, dict, list, zip, *, [1], valuescombination. WTF. I have no idea what this is doing.
I would highly recommend to change the implementation to be more user-friendly and provide an example that show-cases how to use your functions with simple python, i.e., without any of: cat, zip, *, [1], values, list, dict, especially not in combination in one line (as far as possible).