YAML, which stands for Yet Another Markup Language, is a popular data serialization language that is commonly used for configuration files and in applications where data needs to be stored or transmitted.
One of the most useful data types in YAML is the array. YAML arrays allow you to store multiple values in a single key. This can help organize complex data and make it easier to iterate through.
In this comprehensive guide, we‘ll cover everything you need to know about YAML arrays, including:
- What are YAML arrays and why are they useful?
- The syntax for defining YAML arrays
- Multi-line vs single-line array formats
- Nesting arrays and sub-arrays
- Flattening and merging YAML array data
- Working with arrays in code
- Best practices for working with YAML arrays
What are YAML Arrays?
A YAML array is a data structure that allows you to store an ordered sequence of values in a single key. The values can be of any data type including strings, numbers, booleans, objects, or even other arrays.
For example, here is a simple YAML file with an array:
fruits:
- Apple
- Orange
- Strawberry
- Mango
The key fruits
contains an array with four string values. Each value starts with a hyphen -
indicating a new element.
Arrays are incredibly useful when you need to represent a collection of data, like a list or menu items. The array naturally groups the data together under a common key.
Some other common uses cases for YAML arrays include:
Managing databases records such as users, products, etc:
users:
- id: 1
name: John
- id: 2
name: Sarah
Storing scientific data sets and calculations:
sensor_readings:
- [0.58, 0.61, 0.63]
- [0.87, 0.92, 0.90]
Defining machine learning training features:
features:
- duration
- number_of_clicks
- time_of_day
Any time you need ordered collections of data, YAML arrays are a great choice.
Using arrays can simplify your data model tremendously. You can also iterate through arrays easily when consuming the YAML data in your applications.
YAML Array Syntax
YAML offers two main syntax formats for defining arrays:
Multi-line Arrays
The most common way of declaring arrays in YAML files is with each element on its own line:
fruits:
- Apple
- Orange
- Strawberry
This is considered the standard YAML array syntax. Each array item starts with a hyphen -
followed by the value.
You indent all the elements under the key to visually indicate this is an array. The indentation is optional in YAML but considered best practice for readability.
A few key points on multi-line YAML array syntax:
- The hyphen
-
denotes each new element - Subsequent lines are indented (usually 2 spaces)
- Each value can span multiple lines if needed
- Strings don‘t need quotes, but you can optionally add them
You can mix value types within the same array:
mixed_array:
- Apple
- 18
- true
-
name: John
age: 35
- [Cherry, Banana]
Here we have strings, a number, boolean, object, and nested array all together. This demonstrates the flexibility of YAML arrays.
Single-Line Arrays
You can also define arrays on a single line in YAML using inline syntax:
fruits: [Apple, Orange, Strawberry]
Instead of each value on its own line, they are comma-separated and enclosed in brackets [ ]
.
Some points on single-line YAML array syntax:
- Comma-separate each value
- Enclose everything in square brackets
- Useful for small, simple arrays
Single-line format works best for quick arrays, while multi-line works better for larger, nested data. But both are valid and you can mix and match styles.
Comparing Array Syntax Options
To quickly compare, here are the main YAML array syntax formats side-by-side:
Syntax | Example | Description |
---|---|---|
Multi-Line | yaml fruits: - Apple <br> - Orange |
Each value on its own line. Hyphen indicates new element. |
Single-Line | yaml fruits: [Apple, Orange] |
Inline syntax. Comma-separated values in square brackets. |
The multi-line format tends to be preferred because it remains readable with large, complex data. However single-line works well for compact simplicity with smaller arrays.
You can mix both formats freely within the same YAML file as needed.
Working with Nested Arrays and Sub-Arrays
One very powerful feature of YAML arrays is the ability to nest arrays within other arrays. This is known as sub-arrays.
For example:
companies:
- Company One
- John (CEO)
- Sarah (COO)
- Company Two
- Bill (President)
- Susan (CTO)
Our main companies
array contains two sub-arrays. The sub-arrays provide further management details for each company.
This helps organize related data together, almost like folders or categories.
To nest an array, simply define it indented under another array item designated by the hyphen -
.
You can nest arrays indefinitely to represent richer relationships:
continents:
- North America:
- Canada
- United States
- Mexico
- Europe:
- Germany
- France
- Spain
Here our continents contain countries which could even contain states/provinces and so on in deeper levels.
Referencing Array Elements
You can reference array elements directly using their index position:
fruits:
- Apple
- Orange
- Banana
favorite: *fruits_1
Here *fruits_1
sets favorite
to Orange
since that is index position 1. Note YAML indexes start at 0.
This referencing is helpful when data repeats and you want to normalize it.
Be careful with manual indexes though – changes could break references. Use judiciously.
Flattening and Merging YAML Arrays
Sometimes you may need to manipulate array structures – either flattening nested arrays or merging top-level peer arrays together.
Here is an example of flattening nested arrays:
Input
companies:
- Company One
- John
- Sarah
- Company Two
- Bill
- Susan
Output
employees:
- John
- Sarah
- Bill
- Susan
The nested structure was flattened into a simple array.
And arrays can also be merged/concatenated:
Input
fruits:
- Apple
- Orange
vegetables:
- Carrot
- Celery
Output
produce:
- Apple
- Orange
- Carrot
- Celery
The two peer arrays get merged together.
You‘ll need custom code to programmatically flatten or merge arrays in YAML content. But many YAML libraries have utilities to help with it.
Working with Arrays in Code
When you load YAML data into an application, arrays translate easily into native data structures.
For example, here is how YAML arrays get loaded in various languages:
Language | Loads As |
---|---|
JavaScript | Array |
Python | list |
Java | ArrayList |
C# | List/array[] |
This means you can immediately iterate through the arrays and access elements programmatically.
Consider this YAML data:
fruits:
- Apple
- Orange
- Banana
And iteration in Python:
import yaml
with open("data.yaml") as f:
data = yaml.safe_load(f)
for fruit in data[‘fruits‘]:
print(fruit)
# Apple
# Orange
# Banana
The built-in data structures make it very easy to work with loaded YAML arrays.
Best Practices for YAML Arrays
When working with arrays in YAML data, consider these best practices:
Use consistent indentation
Properly indent all arrays, especially nested ones. Consistent indentation helps scanability tremendously for larger, complex data sets.
Break long arrays into lines
Avoid cramming arrays onto one line. Use a clean multi-line format with each value getting its own line and hyphen indicator.
Add comments for context
Use YAML comments above arrays to document what information that array contains, especially for nested data.
Use array references carefully
While referencing values by numeric index can reduce duplication, it introduces fragility if data changes. Only use index references when array positions are stable.
Validate all array changes
Use a linter or other YAML validator to check changes to arrays so that the formatting remains valid after edits. Catch issues early.
Sticking to best practices ensures your YAML array data remains usable and robust against downstream changes.
Summary
This guide provided a deep look at arrays in YAML including:
- What YAML arrays are and advanced use cases like databases, science, and ML
- The core syntax formats including multi-line and single-line arrays
- nesting sub-arrays to represent rich relationships
- Flattening and merging array structures programmatically
- Built-in language integrations for iterating YAML arrays
- Expert best practices for working with array-based data
As you can see, YAML handles arrays as a first-class data structure making them invaluable for organizing ordered collections.
Both developers and IT teams can leverage YAML‘s strong array support to model complex data relationships in their applications and infrastructure.
Understanding arrays fully unlocks the rich data modeling capabilities of YAML when accuracy and flexibility matter.
For more info on working with YAML, check out our advanced YAML Learning Path across 8 additional tutorials!