Terminal GuideTerminal Guide
miller icon

Miller

Dev Tools
macOSLinuxWindows
Go

Like awk/sed/cut for name-indexed data (CSV, JSON, etc.).

Official Website

Features

Multi-formatStreamingSQL-likeDSL

Installation

Homebrew
brew install miller
APT (Debian/Ubuntu)
apt install miller
Pacman (Arch)
pacman -S miller

Why use Miller?

Miller is like AWK, sed, and cut for structured data. It works with CSV, JSON, XML, TOML, and other formats, providing a unified interface for data transformation, filtering, and aggregation—without leaving the command line.

Multi-format Support

Process CSV, JSON, XML, TOML, NDJSON and more with consistent commands. Convert between formats effortlessly.

SQL-like Operations

Use familiar SQL concepts: SELECT, GROUP BY, WHERE, JOIN, and aggregation functions.

Powerful DSL

Miller has its own domain-specific language (DSLX) for complex transformations and custom logic.

Streaming Processing

Process data line-by-line without loading entire files. Perfect for large datasets and pipelines.

Installation

Installation
# macOS (Homebrew)
brew install miller

# Ubuntu/Debian
sudo apt install miller

# Arch Linux
sudo pacman -S miller

# Windows (Chocolatey)
choco install miller

# From source
git clone https://github.com/johnkerl/miller
cd miller
./configure && make && make install

# Docker
docker run -i stedolan/miller mlr --version

Basic Usage

Working with CSV

CSV Operations
# View CSV data
mlr --csv cat data.csv

# Pretty-print CSV
mlr --csv --ojson cat data.csv | head

# Convert CSV to JSON
mlr --csv --ojson cat data.csv > data.json

# Convert CSV to TOML
mlr --csv --otoml cat data.csv

# Display first N rows
mlr --csv head -n 10 data.csv

Selecting and Filtering

Selection & Filtering
# Select specific columns
mlr --csv cut -f name,email,age data.csv

# Filter rows with condition
mlr --csv filter '$age > 30' data.csv

# Multiple conditions
mlr --csv filter '$age > 25 && $status == "active"' data.csv

# Exclude columns
mlr --csv cut -x -f temp_field data.csv

# Filter and select together
mlr --csv filter '$status == "active"' then cut -f name,email data.csv

Data Transformation

Transformations
# Rename columns
mlr --csv rename old_name,new_name data.csv

# Add calculated field
mlr --csv put '$total = $price * $quantity' data.csv

# Format strings
mlr --csv put '$name = toupper($name)' data.csv

# Round numbers
mlr --csv put '$value = round($value, 2)' data.csv

# Conditional assignment
mlr --csv put '$status = $age >= 18 ? "adult" : "minor"' data.csv

Common Patterns

Aggregation and Grouping

Aggregation
# Group by category and count
mlr --csv stats count -g category data.csv

# Sum values by category
mlr --csv stats sum price -g category data.csv

# Multiple aggregations
mlr --csv stats count,sum price,mean age -g department data.csv

# Count distinct values
mlr --csv stats count -a distinct -g product_id data.csv

# Get min/max per group
mlr --csv stats min rating,max rating -g store_id data.csv

Sorting and Ordering

Sorting
# Sort by column
mlr --csv sort -f age data.csv

# Sort numeric (reverse)
mlr --csv sort -nr salary data.csv

# Sort by multiple columns
mlr --csv sort -f department -n age data.csv

# Sort with string case sensitivity
mlr --csv sort -f name data.csv

Format Conversion

Format Conversion
# CSV to JSON
mlr --csv --ojson cat data.csv

# CSV to TOML
mlr --csv --otoml cat data.csv

# JSON to CSV
mlr --json --ocsv cat data.json

# XML to JSON
mlr --xml --ojson cat data.xml

# Pretty-print JSON
mlr --json --ojson --jvstack cat data.json

Joining Data

Joining
# Join two CSV files
mlr --csv join --left --lk id --rk user_id users.csv orders.csv

# Inner join
mlr --csv join --inner --lk id --rk id file1.csv file2.csv

# Use key lookup
mlr --csv step --from users.csv -a getters 'id' then   join --left --lk user_id --rk id orders.csv

Advanced Transformations

Advanced Transforms
# Pivot/reshape data
mlr --csv unsparsify data.csv

# Sample records
mlr --csv sample -k 100 data.csv

# Reverse field order
mlr --csv seqgen -n 10 | mlr --csv shuffle

# Repeat each record
mlr --csv repeat -n 3 data.csv

# Create new field sequences
mlr --csv seqgen -n 1000 | mlr --csv put '$squared = $i ** 2'

Advanced Features

Using the Miller DSL (DSLX)

DSLX Programming
# Use put with complex expressions
mlr --csv put '@sum += $amount; @count += 1; @avg = @sum / @count' data.csv

# Define and use variables
mlr --csv put 'begin { @threshold = 100 } $value > @threshold' data.csv

# Use built-in functions
mlr --csv put '$lower = tolower($name); $len = strlen($email)' data.csv

# Loop operations
mlr --csv put 'for (i = 1; i <= 10; i += 1) { @count[i] = 0 }' data.csv

Streaming Processing

Streaming
# Process NDJSON (newline-delimited JSON)
mlr --ndjson cat large-file.ndjson

# Filter while streaming
mlr --ndjson filter '$status == "active"' large-file.ndjson

# Streaming aggregation
mlr --csv stats count -g category data.csv

# Process multiple files
mlr --csv --mfn cat *.csv

# Specify record and field separator
mlr --csv --rs '@' --fs ',' cat custom-data.csv

Statistical Analysis

Statistics
# Compute percentiles
mlr --csv stats p10,p50,p90 -f salary data.csv

# Get cardinality
mlr --csv stats count -a distinct -f product_id data.csv

# Standard deviation and variance
mlr --csv stats stddev,variance -f price data.csv

# Top N values
mlr --csv top -f value -n 10 data.csv

Regular Expressions

Regex
# Filter with regex
mlr --csv filter '$email =~ ".*@gmail\.com"' data.csv

# Match pattern negation
mlr --csv filter '$email !~ ".*@company\.com"' data.csv

# Extract with regex
mlr --csv put '$domain = sub($email, ".*@", "")' data.csv

# Case-insensitive regex
mlr --csv filter '$name =~ "[Jj]ohn"' data.csv

Command Reference

CommandDescriptionExample
catOutput recordsmlr --csv cat data.csv
cutSelect columnsmlr --csv cut -f name,email
filterFilter rowsmlr --csv filter '$age > 30'
putAdd/modify fieldsmlr --csv put '$total = $a * $b'
statsAggregate statisticsmlr --csv stats sum price -g category
sortSort recordsmlr --csv sort -f name
joinJoin filesmlr --csv join --left --lk id

Tips

  • Always specify input and output formats explicitly with --csv, --json, etc. for clarity and reliability
  • Use then to chain multiple operations: mlr --csv filter ... then cut ... then sort
  • Miller is excellent for exploratory data analysis—use it to understand your data before loading into databases
  • The --mfn flag enables multi-file processing without concatenating files first
  • Use --barred or --markdown for pretty-printed output
Written by Dai AokiPublished: 2026-01-20

Related Articles

Explore More