Skip to content

dfeneyrou/styml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

styml-logo

Build and check

STYML : An efficient C++ single-header STrictYaML parser and emitter

The choice of data serialization formats seems large, but each of them comes at a cost:

  • YAML is readable, error-prone and bloated
  • JSON is simple, limited and verbose
  • XML is solid, hard to work with and hard to read for humans
  • TOML is minimal, noisy and scales poorly

styml is an implementation of StrictYAML and aims at:

  • simplicity, by removing bloat from YAML
  • readability, inherited from YAML
  • efficiency, fast and lean on memory
  • portability, copying a single header is enough
  • user friendliness, with item order keeping, comment persistency and O(1) map access

A lean and practical subset of YAML

styml is a subset of YAML with drastic cut-offs:

  • the heteroclite data types are removed: all data are string (not as crazy as it seems)
  • the superfluous object representation is removed
  • the complex and error-prone anchors and references are removed
  • the nested-secondary-syntax JSON inline flow style is removed
  • the block mapping keys, or complex keys are removed (keys are single line only)

styml targets most use-cases from YAML (cross-language data sharing, log files, interprocess messaging, object persistence, configuration files, ...) without entering the feature-creep zone.
Indeed, at the price of not fitting some projects that genuinely require these advanced/complex features.

Quick start

  1. Copy styml.h into your project. No external dependencies.
  2. Add a few lines of code, as in the example below
    #include "styml.h"
    ...

    // Parse the input text string
    styml::Document root;
    try {
        root = styml::parse(inputStringText);
    } catch (styml::ParseException& e) {
        ...
    }

    // Read items
    std::string ciRunCmds = root["build"]["steps"][0]["run"].as<std::string>();
    float fontSize = root["build"]["font size"].as<float>(16.); // With default

    // Write items
    root["build"]["font size"] = fontSize + 1.;

    // Emit (strings are easily saved on disk)
    std::string pythonString = root.asPyStruct(); // Emit as python structure
    std::string yamlString = root.asYaml(); // Emit as YAML, keeping comments and item order

As measured in the performance section, styml is fast enough to be directly used as a live database for configuration.

API

The API is quite straightforward, especially if you are familiar with yaml-cpp.

The two main sections are:

[Click to open] A `parse` function to create a `Document` from a string

Document and parsing function

A Document is simply a (root) Node with 2 additional features:

  • it owns the YAML tree
    • its destruction releases the document. All Node objects related to it are invalidated and shall no more be used.
  • it owns the emission API
    • std::string asPyStruct(bool withIndent = false) const emits a Python evaluable string, compact (default) or with indent
    • std::string asYaml() const emits a YAML string

A Document can be created from scratch.

// preference:
//   font size: 4
//   font name: helvetica
//   names:
//     - toto
//     - 14
styml::Document doc;
doc = styml::NodeType::MAP; // Choice is between MAP and SEQUENCE. This choice must be done.

doc["preferences"]              = styml::NodeType::MAP; // This node is also a MAP
doc["preferences"]["font size"] = 4;
doc["preferences"]["font name"] = "Helvetica";

doc["preferences"]["names"] = styml::NodeType::SEQUENCE; // This node is a SEQUENCE
doc["preferences"]["names"].push_back("toto");
doc["preferences"]["names"].push_back(14); // Just for example, as this number is turned into a string 

It can also be created from a YAML string in memory with one of the following function:

// Canonical form
styml::Document parse(const std::string& text);

// Variant with const char* input. It does not need to be zero terminated
styml::Document parse(const char* text, uint32_t textSize);

// Variant with const char* input. It must be zero terminated
styml::Document parse(const char* text);
[Click to open] A `Node` class representing a typed item in the YAML tree

Node

The main object in styml is Node, which represent a typed item in the YAML tree.
It can have 4 different types:

Node type Description Comments Example
NodeType::VALUE In StrictYAML, a value is always a string Converted into a compatible format with .as<Type>() or .as<Type>(default value) 25 in age: 25 is a convertible string
NodeType::SEQUENCE A sequence container is an ordered list of children of any type, except NodeType::KEY Children are accessed by their index number - a is a sequence of size 1 containing a value string
NodeType::MAP A map container is an unordered list of children exclusively of type NodeType::KEY Children are accessed by their string name age: 25 is a map of size 1 containing a child key named age
NodeType::COMMENT Represents a comment item # This is a comment

The Node API is restricted depending on its type, as shown in the table below ("X" means accessible):

Method Value Sequence Map Key Comment
NodeType type() X X X X X
bool isValue() X X X X X
bool isKey() X X X X X
bool isSequence() X X X X X
bool isMap() X X X X X
bool isComment() X X X X X
Node& operator=(const T&) X X X X (via value)
Node& operator=(newKind) X X X X (via value)
std::string keyName() X
Node value() X
as<T>() X X (via value)
as<T>(const T& deflt) X X (via value)
iterator begin() X X
iterator end() X X
size_t size() X X
Node operator[](uint32_t) X
void push_back(const T&) X
void push_back(NodeType) X
void insert(uint32_t, const T&) X
void insert(uint32_t, NodeType) X
void remove(uint32_t) X
void pop_back() X
bool hasKey(const std::string&) X
Node operator[](const std::string&) X
void insert(const std::string&, const T&) X
void insert(const std::string&, NodeType) X
bool remove(const std::string&) X

Complementary information below:

[Click to open] About exceptions

Exceptions

After careful consideration, styml error handling is based on C++ exceptions rather than carrying an error context in each API:

  • it enables bloat-free tree manipulation API like the operator[] which is a natural access for containers
  • it allows a global handling of error for a whole section of YAML tree manipulation

Special care was taken to the error messages and exceptions are kept simple.
They just contain a message (queried with standard what()) and can be of 3 kinds:

  • ParseException raised only during parsing.
  • AccessException raised when manipulating the tree
  • ConvertException not seen by user but shall be thrown when implementing a custom type converter.
[Click to open] Examples

Examples

Parsing a YAML string and emitting it in Python

const char* inputText = R"END(
foo: 1
bar: John Doe
)END";

// Parse
styml::Document root;
try {
  root = styml::parse(inputText);
} catch (styml::ParseException& e) {
    printf("Parsing error: %s\n", e.what());
    exit(1);
}

// Emit in Python with indentation (bigger but more readable for human)
std::string output = root.asPyStruct(true);
printf("%s\n", output.c_str());

Reading and writing fields

const char* document = R"END(
name: build machine
steps:
)END";

Document root = parse(document);
assert(root["name"].as<std::string>()==std::string("build machine"));
assert(root["steps"].as<std::string>()==std::string(""));

root["version"] = "1.0.0";
root["steps"] = styml::SEQUENCE; // Override the empty string with a sequence
root["steps"].push_back("first value is string");
root["steps"].push_back(3.14159); // Reminder: stored as a string

printf("YAML:\n%s\n", root.asYaml().c_str());
/* Output is:
YAML:
name: build machine
steps:
  - first value is string
  - 3.141590
version: 1.0.0
*/

Building a map from scratch and accessing it

constexpr int MaxMapSize = 1000000;
Document root;

// Create the lookup "<number>" = number (stored as a string)
root = NodeType::MAP;
for (int i = 0; i < MaxMapSize; ++i) { root[std::to_string(i)] = i; }

// Remove 1 each 3
for (int i = 0; i < MaxMapSize; i += 3) { root.remove(std::to_string(i)); }

// Check correctness
for (int i = 0; i < MaxMapSize; ++i) {
   if ((i % 3) == 0) {
        assert(!root.hasKey(keys[i]));
    } else {
        Node n = root[std::to_string(i)];
        assert(n.isValue());
        assert(n.as<std::string>() == std::to_string(i));
    }
}

Building a sequence from scratch and accessing it

constexpr int MaxSequenceSize = 1000000;
Document root;

// Create the array of doubles (stored as a string)
root = NodeType::SEQUENCE;
for (int i = 0; i < MaxSequenceSize; ++i) { root.push_back(2 * i); }

// Check correctness
for (int i = 0; i < MaxSequenceSize; ++i) {
    assert(root[i].as<int>()== 2* i);
}

Performance

To evaluate performance with references, styml is compared to the following C++ YAML libraries:

To be fair, please note that these libraries are full-featured YAML and have to deal with more complexity that styml has to.
On the other side, trading complexity for benefits is exactly the point...

Measures are all done on the same laptop (i7-11800H @2.30GHz on Linux).

TL;DR: Results show that styml leverages the gained simplification: it is much faster and memory efficient than both libraries above, with multiple order of magnitude on the access timings.

Some key details about the implementation leading to these results:

  • Use of arena allocator to store and work efficiently with strings
  • Use of high performance hashtable for O(1) access time for maps
  • Use of efficient storage and document tree representation to achieve low memory footprint

The code used to evaluate these libraries can be found here.

The reference YAML files are taken from rapidyaml and are compatible with StrictYAML:

Parsing speed

The parsing speed is the size of the input file divided by the time to parse it into a usable structure in memory.

Filename yaml-cpp rapidyaml (in place) styml Speed factor
"Map" 4.958 MB/s 55.688 MB/s 70.644 MB/s 14.2x and 1.3x
"Seq" 5.808 MB/s 47.214 MB/s 138.082 MB/s 23.8x and 2.9x

Note

styml parses between 30% and 200% faster than rapidyaml, and at least 14 times faster than yaml-cpp.
Benefit from StrictYAML simplification: less syntax to handle means more speed.

Memory usage factor

The memory factor is the quantity of memory used after parsing divided by the input file size:

Filename yaml-cpp rapidyaml (in place) styml Memory gain
"Map" (filesize 10.8 MB) 87.7x (945.2 MB) 14.9x (160.6 MB) 6.8x (73.3 MB) 12.9x and 2.2x
"Seq" (filesize 7.9 MB) 64.0x (505.1 MB) 20.4x (160.6 MB) 3.6x (28.2 MB) 17.9x and 5.7x

Note

styml uses less than half the memory consumed by rapidyaml and 60x less memory than yaml-cpp, while also indexing map access.
Benefit from StrictYAML simplification: less object types to encode means more optimized memory layout.

YAML Emission speed

The emission speed is the size of the input file divided by the time to emit it back in YAML (after parsing, excluded from the measure):

Filename yaml-cpp rapidyaml styml Speed factor
"Map" 7.766 MB/s 266.436 MB/s 326.281 MB/s 42.0x and 1.2x
"Seq" 9.871 MB/s 353.479 MB/s 323.400 MB/s 32.8x and 0.9x

Note

styml and rapidyaml are emitting YAML roughly at the same speed, for a mix of maps and sequences. styml is also able to emit python structures at the speed of 416 MB/s and 593 MB/s respectively.

Document access speed

Building (=writing) a document programmatically from scratch through the API, in millions of items per second:

Filename yaml-cpp rapidyaml styml Speed factor
Map of 10000 0.014 Mi/s 0.053 Mi/s 9.091 Mi/s 649x and 168x
Sequence of 10000 4.517 Mi/s 0.075 Mi/s 42.194 Mi/s 9x and 562x
Map of 1000000 (quadratic, too slow) (quadratic, too slow) 7.875 Mi/s N/A
Sequence of 1000000 3.700 Mi/s (quadratic, too slow) 39.569 Mi/s 10x and N/A

Note

Only styml has a O(1) access time per map field, others have a O(N) leading to quadratic time for a full build.
rapidyaml does not even take benefit from the random access property of sequences.


Reading fields of a document programmatically through the API, in millions of items per second:

Filename yaml-cpp rapidyaml styml Speed factor
Map of 10000 0.014 Mi/s 0.053 Mi/s 42.553 Mi/s 3000x and 800x
Sequence of 10000 37.037 Mi/s 0.076 Mi/s ~1000.000 Mi/s ~30x and ~10000x
Map of 1000000 (quadratic) (quadratic) 23.317 Mi/s N/A
Sequence of 1000000 25.757 Mi/s (quadratic) 745.712 Mi/s 29x and N/A

Note

rapidyaml scales poorly with large structure access and it handles sequences like maps, in a quadratic way.
yaml-cpp has a genuine but rather slow random access for sequence when reading an indexed array.

Misc

Extension with custom type

[Click to open] Full description how to add a custom type

Converters from/to strings are built-in for usual types:

int valueInt                  = valueNode.as<int>();
int valueUInt32               = valueNode.as<uint32_t>();
int valueDouble               = valueNode.as<double>();
std::string valueString       = valueNode.as<std::string>();
const char* valueConstCharPtr = valueNode.as<const char*>();
...

Defining conversions for your own types is done by specializing the styml::convert<> class.

The example below explains the different steps:

  • Let's consider the custom point structure:
// Custom structure
struct MyPoint {
    float x;
    float y;
    int   value;
};
  • An implementation of the converter specializing the styml::convert<> class is:
namespace styml
{
template<>
struct convert<MyPoint> {
    // From custom type to std::string. The format (here with brackets) does not matter as long
    // as it stays on one line and the encode and decode methods are matching.
    static std::string encode(const MyPoint& point)
    {
        char workBuf[256];
        if (snprintf(workBuf, sizeof(workBuf), "[ %f, %f, %d ]", point.x, point.y, point.value) == sizeof(workBuf)) {
            throwMessage<ConvertException>("Too small internal buffer (%zu) for encoding", sizeof(workBuf));
        }
        return workBuf;
    }

    // From C string to custom type
    static void decode(const char* strValue, MyPoint& point)
    {
        if (sscanf(strValue, "[ %f, %f, %d ]", &point.x, &point.y, &point.value) != 3) {
            throwMessage<ConvertException>("Cannot convert the following string into a MyPoint structure: '%s'", strValue);
        }
    }
};
}  // namespace styml
  • And its usage is identical to built-in types:
MyPoint point{3.14f, 2.78f, 42};

// The structure ''MyPoint' is turned into a string via a call to convert<MyPoint>::encode
root["custom"] = point;

// The string is turned into a 'MyPoint' structure via a call to convert<MyPoint>::decode
MyPoint pointRead = root["custom struct"].as<MyPoint>();

assert(memcmp(&pointRead, &point, sizeof(MyPoint)) == 0);

IMPORTANT:

  • the conversion class shall be placed in the styml namespace
  • it is up to the conversion class to throw the ConvertException in case of syntax errors
  • design note: the usage of std::string and exceptions are used for convenience, not performance

Support

styml requires C++17 or above.

Supported OS:

  • Linux
  • Windows

Note: performance on Windows are lower than on Linux.

Limitations

  • Missing API to modify comments
  • Partial unicode escaping

Also this project is young, feedback is welcome!

License

styml source code is available under the MIT license

Associated components:

About

C++ single-header STrictYaML parser and emitter

Topics

Resources

License

Stars

Watchers

Forks