Skip to content

chore: improve code maintainability #230

@techfg

Description

@techfg

As currently written in v0.6.7, the code itself has a very high cost of maintenance. Making even the smallest change in any number of areas requires an incredible amount of care and attention given the way the code is written/structured. Combined with the general lack of tests, any change carries a significant risk of regressions and/or new bugs being introduced.

Beyond the lack of unit tests and general testability of the code itself (see #166), some examples that contribute to the high cost & risks, in order of their contributing weight, are:

  1. Long running functions that orchestrate complex process that are difficult to understand purpose and/or expected behavior. For example, Retrieve is 224 lines long and is comprised of at least 11 different "steps" that it orchestrates all inline within the single function. Outside of a few comments here and there that don't add a ton of value, especially to someone not familiar with the terminology "warden, pliny, etc.", this code takes more time then necessary to understand. Breaking it out in to separate functions with names that clarify their purpose would simplify this greatly.
  2. In the absence of unit tests, lack of any comments around expected behaviors of complicated blocks of code where it is not easily obvious what is happening. For example, starting at line 129 and going for 72 lines, the value of --since is parsed but other than a few terse comments scattered around, there is no clear definition of the full set of supported formats and what the complete expected behavior is.
  3. Inconsistent use of Named return values and "Naked" returns - There are short functions that use named returns, short functions that don't, long functions that do and long functions that don't. There are short functions that used naked returns, short functions that don't, long functions that do and long functions that don't - there are even functions that have mixed used (e.g., here and here as well as here and here which are only 4 lines apart). While usage should be consistent either across the board or within a "short/long" type, especially for "long" functions, it should be quick and painless to identify return values. For example, looking at line 216, you have to scroll up 142 lines to find the names of the returned values, then scroll back down to find the last place they were set.
  4. Inconsistent constant naming - Some constants are UPPER SNAKE CASE and some constants camelCase (or upper camel/pascal case), there's even inconsistency with the same file here and here
  5. Misleading naming of Public vs Private functions - Nearly everything is a "Public" function even though many functions are only called from within the same file (e.g., SanitizeZip), let alone the same package and its obvious their use should never be consumed "publicly". Not identifying a function as private and making it part of the "API" so to speak, even for a non-library package, results in increasing the cost of maintaining that code because of things like having to add validations (that may be covered by other calling functions), having to think of every possible way a function could be used from the "outside", etc. More importantly, when trying to understand the code, "private" means something to the person trying to understand the code in terms of its intended purpose/usage, it helps inform the person. Given the lack of testability of the current code, as currently written, some current use of "Public" is required to make what would otherwise be "private" functions testable but this is a clear indication of the lack of testability of the code and shouldn't be required in order to test it properly.

All of these issues (and others not specifically mentioned), increase the amount of time that it takes to try to understand the code and, in some cases, coupled with the general lack of unit tests, gaining a full understanding is nearly impossible.

To summarize:

  1. Since the repo doesn't have any official coding standards and/or contributor guidelines, as someone trying to contribute (either internal or external to Skuid), it should be relatively easy to identify the "expected conventions" used, however there is no consistency currently.
  2. As that person trying to contribute, the existing code should be "easy enough" to understand. The CLI is fairly complex, in a lot of cases unnecessarily, but either way, the complexity should be relatively straightforward to understand by spending a short amount of time with the code.
  3. It doesn't matter what the standards are, the key is for the code to be consistent and well structured so that the cost of maintenance is lowered and risk of introducing bugs or causing regressions kept to an absolute minimum as a result.

In short, and again, separate from writing tests and improving testability, the following improvements should be made:

  1. Code refactored into units that help clarify their intended purpose
  2. Comments should be added where appropriate explaining complex blocks of code so that anyone can understand
  3. Consistency in use of named/unnamed returns
  4. Constants named consistently
  5. Consistent use of public/private functions based on their intended purpose/usage

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions