Skip to content

Implementation and Example for EVM code are very complicated and almost not to be understood #59

@siebenkopf

Description

@siebenkopf

I have some doubts on the implementation of the EVM code, which are the following:

  1. Most importantly, the return types are neither standard nor intuitive:
    a) Instead of turning the function into a generator, you should rather return the values in a useful data type (see next point).
    b) Instead of returning a (nested) 2-element tuple, I would rather return a (nested) dictionary with the first element of the tuple as key, and the second as value.
    c) When returning from the inference function, the first element is always identical "probs":
    yield ("probs", (batch_to_process, probs))

    There is no reason to provide this as a result, it will be ignored in any function that calls this one.

I do understand why you are returning it this way -- to be able to call it in parallel processes -- but handling dictionaries in the calling function is at least as simple.

  1. Providing input parameters in an argument parser structure is very uncommon:
    Any function that wants to use your functions will need to either take your provided arg parser, or replicate your parser structure. The more common way is to have a separate parameter with a proper default value. Then, you can collect your parameters in a dictionary and pass them to the function, such as:
params = dict(tailsize=0.1, distance_metric="euclidean")
results = EVM_Training(..., **params)
  1. As far as I understood your code, the chunk size is based on the number samples, where we could have many per class. However, the concatenation of features is tested for equality with the chunk size:

    if len(temp) == args.chunk_size:

    It might happen that I this test is never met, when we have more samples in the collection than we have defined as chunk size.

  2. For that the Example.iynb should serve as an example on how to use your functions, there are many parts that are very difficult to be understood: https://github.com/Vastlab/vast/blob/2ad7fa23cb8f0f1acb71d6b6d9aced6a70562e0c/vast/opensetAlgos/Example.ipynb
    a) Cell 2 uses some pre-defined data that is downloaded from your web page, but it is explained nowhere what this data contains.
    b) The line 2 in cell 3 contains a nested list, map, list, lambda, apply function that is impossible to follow, especially when we do not know how the data looks like.
    c) Cell 10 processes the complicated (see my point 1 above) output of your EVM_Training function using a combination of dict, list, zip, *, [1] which does not help in any way to understand how to handle the results of the training.
    d) Cell 11 contains a cat, list, dict, list, zip, *, [1], values combination. WTF. I have no idea what this is doing.

I would highly recommend to change the implementation to be more user-friendly and provide an example that show-cases how to use your functions with simple python, i.e., without any of: cat, zip, *, [1], values, list, dict, especially not in combination in one line (as far as possible).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions