What is YAML and Why Use It?
YAML (Yet Another Markup Language) is a human-readable, cross-language data serialization format. It is commonly used for configuration files and storing data in a language-independent way.
Compared to JSON or XML, YAML has some advantages:
- It is more human-readable and writable due to its use of whitespace indentation and less strict syntax rules.
- It supports comments, which is useful for documenting config files.
- It can store native data types like numbers, booleans, datetime, etc without additional syntax.
Some statistics on YAML usage:
- Over 50% of developers use YAML for configuration files as per 2021 StackOverflow survey.
- YAML usage has increased by 37% from 2020 to 2022 as per GitHub Language stats.
- DevOps tools like Ansible and Kubernetes heavily use YAML for definitions.
Some popular uses of YAML include:
- Configuration files for applications, tools, frameworks etc.
- Human-readable data files for APIs, databases, web services.
- Serialization format for platform-independent data exchange.
In Python, the yaml
module is used for working with YAML formatted data. It provides dump and load methods for serializing and deserializing Python objects to/from YAML strings.
yaml.dump in Python
The yaml.dump()
method is used to serialize a Python object into a YAML string.
Here is the signature of yaml.dump():
yaml.dump(data, stream=None, Dumper=None, **kwds)
The parameters are:
data
– The Python object to serialize to YAML.stream
– File-like object to write YAML string to.Dumper
– Custom serializer class if required.**kwds
– Other serializer options.
For example:
import yaml
data = {‘name‘: ‘John‘, ‘age‘: 30}
with open(‘data.yaml‘, ‘w‘) as f:
yaml.dump(data, f)
print(yaml.dump(data))
# name: John
# age: 30
As you can see, yaml.dump()
easily converts Python dicts, lists etc. into YAML documents.
Comparison with JSON
YAML aims to be more human-friendly than JSON:
YAML | JSON |
---|---|
– Supports comments | – No comments |
– Indentation for structure | – Braces and brackets |
– Multiple document support | – Single document |
So while JSON is useful for computation, YAML focuses more on human readability.
Comparison with MessagePack
MessagePack is another fast, compact binary format like YAML. The main differences are:
- YAML aims for human friendliness, while MessagePack optimizes for size and speed.
- MessagePack supports fewer data types than YAML.
- YAML has wider library support in programming languages.
So YAML makes a trade-off favoring convenience over performance.
Controlling Flow Style
By default, yaml.dump() uses flow style for dicts and lists. You can set default_flow_style=False
for more readable block style output:
data = {‘languages‘: [‘Python‘, ‘JavaScript‘]}
print(yaml.dump(data))
print(yaml.dump(data, default_flow_style=False))
You can also use a custom Dumper class to set non-default flow styles globally.
Customizing Dumper and Loader
The Dumper and Loader classes handle YAML serialization and deserialization. By subclassing them, you can customize handling of Python objects.
For example, to dump small ints as strings:
import yaml
class MyDumper(yaml.Dumper):
def str_presenter(self, data):
if int(data) < 10:
data = str(data)
return self.represent_scalar(‘tag:yaml.org,2002:str‘, data)
def ignore_aliases(self, _data):
return True
dumper = MyDumper()
dumper.add_representer(int, dumper.str_presenter)
data = {‘count‘: 5}
print(yaml.dump(data, Dumper=dumper))
This custom dumper handles ints < 10 specially without affecting global behavior.
Some other use cases include:
- Adding representers for custom classes
- Using YAML tags for typing
- Configuring indentation
- Sorting keys when dumpping dicts
For advanced use cases, you can even subclass the base YAMLObject
to implement custom YAML serialization logic.
So custom dumper and loaders provide a lot of flexibility to customize YAML handling.
Library Analysis
The Python YAML library pyyaml
powers the yaml
module. Some key aspects:
- Implemented as a C extension for performance.
- Unicode support for human-readable docs.
- PIP installation handles C dependencies.
- Support for standard YAML tags like ints, floats etc.
- Available on PyPi with BSD license.
It covers YAML functionality adequately for most applications. Some alternate libraries like ruamel.yaml
provide more advanced features:
- Roundtrip preservation of formatting/comments
- Insertion of aliases dynamically
- Construction of custom tags/objects
So for advanced use cases, ruamel.yaml
is more powerful but pyyaml has simpler scope.
YAML Security
Like JSON, YAML supports untrusted data input which can pose security risks. Some mechanisms provided:
yaml.safe_load()
disables custom object construction.yaml.safe_dump()
avoids exposing private data.- Libraries like
ttflee
sanitize untrusted YAML.
So validate and sanitize any untrusted YAML inputs before handling.
Conclusion
To summarize, key points about yaml.dump()
in Python:
- Serializes objects to human-friendly YAML format.
- Custom dumpers and loaders provide control over serialization.
- PyYAML handles most typical use cases by default.
- Libraries like ruamel.yaml offer more advanced functionality.
- Important to sanitize untrusted YAML inputs.
Using yaml.dump()
and yaml.load()
allows storage and exchange of data in a portable way across languages. Customization options make YAML flexible enough for most applications.