YAML, which stands for Yet Another Markup Language, is a popular data serialization language that is commonly used for configuration files and in applications where data needs to be stored or transmitted.

One of the most useful data types in YAML is the array. YAML arrays allow you to store multiple values in a single key. This can help organize complex data and make it easier to iterate through.

In this comprehensive guide, we‘ll cover everything you need to know about YAML arrays, including:

  • What are YAML arrays and why are they useful?
  • The syntax for defining YAML arrays
  • Multi-line vs single-line array formats
  • Nesting arrays and sub-arrays
  • Flattening and merging YAML array data
  • Working with arrays in code
  • Best practices for working with YAML arrays

What are YAML Arrays?

A YAML array is a data structure that allows you to store an ordered sequence of values in a single key. The values can be of any data type including strings, numbers, booleans, objects, or even other arrays.

For example, here is a simple YAML file with an array:

fruits:
  - Apple 
  - Orange
  - Strawberry
  - Mango

The key fruits contains an array with four string values. Each value starts with a hyphen - indicating a new element.

Arrays are incredibly useful when you need to represent a collection of data, like a list or menu items. The array naturally groups the data together under a common key.

Some other common uses cases for YAML arrays include:

Managing databases records such as users, products, etc:

users:
  - id: 1
    name: John
  - id: 2  
    name: Sarah

Storing scientific data sets and calculations:

sensor_readings:
  - [0.58, 0.61, 0.63]
  - [0.87, 0.92, 0.90] 

Defining machine learning training features:

features:
  - duration
  - number_of_clicks    
  - time_of_day

Any time you need ordered collections of data, YAML arrays are a great choice.

Using arrays can simplify your data model tremendously. You can also iterate through arrays easily when consuming the YAML data in your applications.

YAML Array Syntax

YAML offers two main syntax formats for defining arrays:

Multi-line Arrays

The most common way of declaring arrays in YAML files is with each element on its own line:

fruits:
  - Apple  
  - Orange
  - Strawberry

This is considered the standard YAML array syntax. Each array item starts with a hyphen - followed by the value.

You indent all the elements under the key to visually indicate this is an array. The indentation is optional in YAML but considered best practice for readability.

A few key points on multi-line YAML array syntax:

  • The hyphen - denotes each new element
  • Subsequent lines are indented (usually 2 spaces)
  • Each value can span multiple lines if needed
  • Strings don‘t need quotes, but you can optionally add them

You can mix value types within the same array:

mixed_array:
  - Apple
  - 18 
  - true
  - 
    name: John
    age: 35
  - [Cherry, Banana]

Here we have strings, a number, boolean, object, and nested array all together. This demonstrates the flexibility of YAML arrays.

Single-Line Arrays

You can also define arrays on a single line in YAML using inline syntax:

fruits: [Apple, Orange, Strawberry]

Instead of each value on its own line, they are comma-separated and enclosed in brackets [ ].

Some points on single-line YAML array syntax:

  • Comma-separate each value
  • Enclose everything in square brackets
  • Useful for small, simple arrays

Single-line format works best for quick arrays, while multi-line works better for larger, nested data. But both are valid and you can mix and match styles.

Comparing Array Syntax Options

To quickly compare, here are the main YAML array syntax formats side-by-side:

Syntax Example Description
Multi-Line yaml fruits: - Apple <br> - Orange Each value on its own line. Hyphen indicates new element.
Single-Line yaml fruits: [Apple, Orange] Inline syntax. Comma-separated values in square brackets.

The multi-line format tends to be preferred because it remains readable with large, complex data. However single-line works well for compact simplicity with smaller arrays.

You can mix both formats freely within the same YAML file as needed.

Working with Nested Arrays and Sub-Arrays

One very powerful feature of YAML arrays is the ability to nest arrays within other arrays. This is known as sub-arrays.

For example:

companies:
  - Company One  
    - John (CEO)
    - Sarah (COO)

  - Company Two
    - Bill (President)  
    - Susan (CTO)

Our main companies array contains two sub-arrays. The sub-arrays provide further management details for each company.

This helps organize related data together, almost like folders or categories.

To nest an array, simply define it indented under another array item designated by the hyphen -.

You can nest arrays indefinitely to represent richer relationships:

continents:
  - North America:  
     - Canada
     - United States  
     - Mexico

  - Europe:
     - Germany
     - France
     - Spain

Here our continents contain countries which could even contain states/provinces and so on in deeper levels.

Referencing Array Elements

You can reference array elements directly using their index position:

fruits: 
  - Apple
  - Orange
  - Banana

favorite: *fruits_1

Here *fruits_1 sets favorite to Orange since that is index position 1. Note YAML indexes start at 0.

This referencing is helpful when data repeats and you want to normalize it.

Be careful with manual indexes though – changes could break references. Use judiciously.

Flattening and Merging YAML Arrays

Sometimes you may need to manipulate array structures – either flattening nested arrays or merging top-level peer arrays together.

Here is an example of flattening nested arrays:

Input

companies:
  - Company One 
    - John
    - Sarah

  - Company Two  
    - Bill 
    - Susan

Output

employees:
  - John
  - Sarah 
  - Bill
  - Susan

The nested structure was flattened into a simple array.

And arrays can also be merged/concatenated:

Input

fruits: 
  - Apple
  - Orange

vegetables:
  - Carrot
  - Celery  

Output

produce:
  - Apple 
  - Orange
  - Carrot
  - Celery

The two peer arrays get merged together.

You‘ll need custom code to programmatically flatten or merge arrays in YAML content. But many YAML libraries have utilities to help with it.

Working with Arrays in Code

When you load YAML data into an application, arrays translate easily into native data structures.

For example, here is how YAML arrays get loaded in various languages:

Language Loads As
JavaScript Array
Python list
Java ArrayList
C# List/array[]

This means you can immediately iterate through the arrays and access elements programmatically.

Consider this YAML data:

fruits: 
  - Apple
  - Orange
  - Banana

And iteration in Python:

import yaml

with open("data.yaml") as f:
  data = yaml.safe_load(f)

  for fruit in data[‘fruits‘]:
    print(fruit)

# Apple
# Orange
# Banana

The built-in data structures make it very easy to work with loaded YAML arrays.

Best Practices for YAML Arrays

When working with arrays in YAML data, consider these best practices:

Use consistent indentation

Properly indent all arrays, especially nested ones. Consistent indentation helps scanability tremendously for larger, complex data sets.

Break long arrays into lines

Avoid cramming arrays onto one line. Use a clean multi-line format with each value getting its own line and hyphen indicator.

Add comments for context

Use YAML comments above arrays to document what information that array contains, especially for nested data.

Use array references carefully

While referencing values by numeric index can reduce duplication, it introduces fragility if data changes. Only use index references when array positions are stable.

Validate all array changes

Use a linter or other YAML validator to check changes to arrays so that the formatting remains valid after edits. Catch issues early.

Sticking to best practices ensures your YAML array data remains usable and robust against downstream changes.

Summary

This guide provided a deep look at arrays in YAML including:

  • What YAML arrays are and advanced use cases like databases, science, and ML
  • The core syntax formats including multi-line and single-line arrays
  • nesting sub-arrays to represent rich relationships
  • Flattening and merging array structures programmatically
  • Built-in language integrations for iterating YAML arrays
  • Expert best practices for working with array-based data

As you can see, YAML handles arrays as a first-class data structure making them invaluable for organizing ordered collections.

Both developers and IT teams can leverage YAML‘s strong array support to model complex data relationships in their applications and infrastructure.

Understanding arrays fully unlocks the rich data modeling capabilities of YAML when accuracy and flexibility matter.

For more info on working with YAML, check out our advanced YAML Learning Path across 8 additional tutorials!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *