PHP Generators
Introduction to PHP Generators
PHP generators provide a powerful way to create iterators with minimal memory usage. Unlike traditional arrays or iterators that store all data in memory, generators produce values on-demand, making them ideal for processing large datasets or infinite sequences.
A generator is a special type of iterator that yields values one at a time rather than creating them all at once. This lazy evaluation approach offers significant memory savings and performance benefits when working with large amounts of data.
Why Generators Matter
Memory Efficiency: Traditional approaches might load millions of records into memory simultaneously. Generators process one item at a time, using constant memory regardless of dataset size.
Performance Benefits: By avoiding the overhead of creating large arrays, generators can significantly improve performance for data processing tasks.
Simplified Code: Generators eliminate the complexity of implementing the Iterator interface manually while providing the same functionality.
Lazy Evaluation: Values are computed only when needed, allowing for efficient processing of potentially infinite sequences.
The Problem Generators Solve
Consider processing a million-record CSV file. Traditional approaches face significant challenges:
Array Approach Problems:
- Loading all records creates a massive memory footprint
- The entire file must be read before processing begins
- Memory limits can cause crashes with large files
- Processing delay increases with file size
Manual Iterator Complexity:
- Implementing the Iterator interface requires boilerplate code
- State management becomes complex
- Error handling is cumbersome
- Code becomes less readable
Generators elegantly solve these issues by providing a simple syntax for creating memory-efficient iterators that process data on-demand.
How Generators Work
Generators use the yield
keyword instead of return
. When a function contains yield
, PHP automatically creates a Generator object that implements the Iterator interface. Each time yield
is encountered, the function's execution is suspended, the yielded value is returned to the caller, and execution resumes from that point when the next value is requested.
The Generator Lifecycle:
- Creation: Function with
yield
returns a Generator object - First Iteration: Function executes until first
yield
- Suspension: Function state is saved, value returned
- Resumption: Next iteration continues from after
yield
- Completion: Function ends or returns
This suspension and resumption mechanism is what enables generators to use constant memory while processing unlimited data.
Basic Generator Syntax
Simple Generator Example
<?php
/**
* Basic generator that yields numbers 1 through 5
*
* This demonstrates the fundamental concept of generators:
* values are produced on-demand rather than all at once.
*/
function simpleGenerator(): Generator
{
echo "Starting generator\n";
for ($i = 1; $i <= 5; $i++) {
echo "About to yield $i\n";
yield $i;
echo "Resumed after yielding $i\n";
}
echo "Generator finished\n";
}
// Using the generator
echo "Creating generator\n";
$gen = simpleGenerator();
echo "Starting foreach loop\n";
foreach ($gen as $value) {
echo "Received: $value\n";
}
?>
Execution Flow Explained:
This example reveals the interleaved execution pattern of generators:
- Generator Creation: The function doesn't execute immediately. Instead, it returns a Generator object.
- First Iteration: The foreach loop triggers execution until the first
yield
. - Value Transfer: The yielded value is passed to the foreach loop.
- Suspended State: The generator remembers its position and variable values.
- Resumption: The next iteration continues execution from where it paused.
This pattern demonstrates how generators maintain state between iterations without keeping all values in memory.
Generator with Key-Value Pairs
<?php
/**
* Generator that yields key-value pairs
*
* Generators can yield both keys and values, making them
* useful for associative data processing.
*/
function keyValueGenerator(): Generator
{
$data = [
'name' => 'John Doe',
'email' => '[email protected]',
'age' => 30,
'city' => 'New York'
];
foreach ($data as $key => $value) {
yield $key => $value;
}
}
// Using key-value generator
foreach (keyValueGenerator() as $key => $value) {
echo "$key: $value\n";
}
?>
Key-Value Yielding Benefits:
The ability to yield key-value pairs makes generators suitable for:
- Associative Array Processing: Maintain key relationships
- Dictionary Operations: Build map-like structures lazily
- Data Transformation: Convert between formats while preserving keys
- Filtering Operations: Skip entries without breaking key associations
This feature ensures generators can replace arrays in virtually any context while maintaining memory efficiency.
Memory Efficiency Demonstration
Traditional Array Approach vs Generators
<?php
/**
* Comparison showing memory usage difference between
* traditional arrays and generators for large datasets.
*/
// Traditional approach - high memory usage
function createLargeArray(int $size): array
{
$array = [];
for ($i = 0; $i < $size; $i++) {
$array[] = "Item number $i with some additional data to increase memory usage";
}
return $array;
}
// Generator approach - constant memory usage
function createLargeGenerator(int $size): Generator
{
for ($i = 0; $i < $size; $i++) {
yield "Item number $i with some additional data to increase memory usage";
}
}
// Memory usage measurement
function measureMemory(callable $callback, ...$args): void
{
$startMemory = memory_get_usage();
$result = $callback(...$args);
// Process the data to ensure it's actually used
$count = 0;
foreach ($result as $item) {
$count++;
if ($count >= 1000) break; // Process only first 1000 items for demo
}
$endMemory = memory_get_usage();
$memoryUsed = $endMemory - $startMemory;
echo "Memory used: " . number_format($memoryUsed) . " bytes\n";
}
echo "Processing 100,000 items:\n";
echo "Array approach: ";
measureMemory('createLargeArray', 100000);
echo "Generator approach: ";
measureMemory('createLargeGenerator', 100000);
?>
Memory Usage Analysis:
The dramatic difference in memory usage demonstrates why generators are essential for large-scale data processing:
Array Approach:
- Allocates memory for all 100,000 strings simultaneously
- Memory usage grows linearly with data size
- Can hit PHP memory limits with large datasets
- All data must exist in memory before processing begins
Generator Approach:
- Only current item exists in memory
- Memory usage remains constant regardless of data size
- Can process infinite sequences
- Processing can begin immediately
This comparison clearly shows generators' superiority for memory-constrained environments or large dataset processing.
Practical Applications
Reading Large Files
<?php
/**
* Generator for reading large files line by line
*
* This approach allows processing files of any size without
* loading the entire file into memory.
*/
function readLargeFile(string $filename): Generator
{
$handle = fopen($filename, 'r');
if (!$handle) {
throw new InvalidArgumentException("Cannot open file: $filename");
}
try {
$lineNumber = 1;
while (($line = fgets($handle)) !== false) {
yield $lineNumber => rtrim($line, "\r\n");
$lineNumber++;
}
} finally {
fclose($handle);
}
}
/**
* Process CSV data using generators
*
* This generator parses CSV files efficiently, yielding
* one row at a time with proper header handling.
*/
function readCSV(string $filename, bool $hasHeader = true): Generator
{
$handle = fopen($filename, 'r');
if (!$handle) {
throw new InvalidArgumentException("Cannot open CSV file: $filename");
}
try {
$headers = null;
if ($hasHeader) {
$headers = fgetcsv($handle);
}
while (($row = fgetcsv($handle)) !== false) {
if ($headers) {
yield array_combine($headers, $row);
} else {
yield $row;
}
}
} finally {
fclose($handle);
}
}
// Example usage
/*
foreach (readCSV('large_data.csv') as $row) {
// Process each row individually
echo "Processing user: " . $row['name'] . "\n";
}
*/
?>
File Processing Advantages:
Resource Management:
- The
finally
block ensures file handles are always closed - Memory usage remains constant regardless of file size
- Files can be processed as they're being written
- Network streams can be processed in real-time
Error Handling:
- Exceptions propagate normally through generators
- Resource cleanup happens even on errors
- Invalid data can be skipped without loading entire file
Flexibility:
- Line numbers help with error reporting
- Headers can be dynamically detected
- Different file formats can share similar patterns
Database Result Processing
<?php
/**
* Generator for processing database results efficiently
*
* Instead of loading all results into memory, this generator
* fetches and yields one row at a time.
*/
function fetchResults(PDO $pdo, string $query, array $params = []): Generator
{
$stmt = $pdo->prepare($query);
$stmt->execute($params);
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
yield $row;
}
}
/**
* Generator for paginated database results
*
* This generator handles large datasets by fetching results
* in chunks (pages) and yielding individual records.
*/
function fetchPaginated(PDO $pdo, string $table, int $pageSize = 1000): Generator
{
$offset = 0;
do {
$query = "SELECT * FROM $table LIMIT $pageSize OFFSET $offset";
$stmt = $pdo->query($query);
$results = $stmt->fetchAll(PDO::FETCH_ASSOC);
foreach ($results as $row) {
yield $row;
}
$offset += $pageSize;
} while (count($results) === $pageSize);
}
// Example usage with error handling
/*
try {
$pdo = new PDO($dsn, $username, $password);
foreach (fetchResults($pdo, 'SELECT * FROM users WHERE active = ?', [1]) as $user) {
echo "Processing user: " . $user['name'] . "\n";
// Perform operations on each user
// Memory usage remains constant regardless of result set size
}
} catch (PDOException $e) {
echo "Database error: " . $e->getMessage();
}
*/
?>
Database Processing Benefits:
Streaming Results:
- Results are fetched as needed, not all at once
- Database connection remains open during iteration
- Server-side cursors can be utilized
- Memory usage is predictable and minimal
Pagination Strategy:
- Breaks large queries into manageable chunks
- Prevents query timeouts on massive tables
- Allows progress tracking and resumable operations
- Can be combined with transaction batching
Real-World Applications:
- Data migration between systems
- Report generation from large datasets
- ETL (Extract, Transform, Load) operations
- Real-time data processing pipelines
Advanced Generator Techniques
Generator Delegation
<?php
/**
* Generator delegation using yield from
*
* The yield from syntax allows one generator to delegate
* to another, creating powerful composition patterns.
*/
function numberGenerator(int $start, int $end): Generator
{
for ($i = $start; $i <= $end; $i++) {
yield $i;
}
}
function combinedGenerator(): Generator
{
// Yield from multiple generators
yield from numberGenerator(1, 3);
yield from numberGenerator(10, 12);
// Regular yields can be mixed in
yield 'separator';
yield from numberGenerator(20, 22);
}
// Usage
foreach (combinedGenerator() as $value) {
echo "$value\n";
}
// Output: 1, 2, 3, 10, 11, 12, separator, 20, 21, 22
?>
Generator Delegation Explained:
The yield from
expression enables powerful composition patterns:
Delegation Benefits:
- Modularity: Break complex generators into simple components
- Reusability: Share generator logic across applications
- Transparency: Inner generator exceptions bubble up naturally
- Efficiency: No overhead compared to manual yielding
Use Cases:
- Merging multiple data sources
- Building generator pipelines
- Implementing recursive algorithms
- Creating abstract data processors
This feature transforms generators from simple iterators into composable building blocks for complex data processing systems.
Generators with Send and Throw
<?php
/**
* Interactive generator that can receive input
*
* Generators can receive values through the send() method,
* enabling two-way communication.
*/
function interactiveGenerator(): Generator
{
echo "Generator started\n";
$value = yield 'first';
echo "Received: $value\n";
$value = yield 'second';
echo "Received: $value\n";
$value = yield 'third';
echo "Received: $value\n";
return 'Generator finished';
}
// Using send() to communicate with generator
$gen = interactiveGenerator();
echo $gen->current() . "\n"; // 'first'
$gen->send('hello');
echo $gen->current() . "\n"; // 'second'
$gen->send('world');
echo $gen->current() . "\n"; // 'third'
$gen->send('!');
// Get return value (PHP 7+)
$returnValue = $gen->getReturn();
echo $returnValue . "\n"; // 'Generator finished'
?>
Bidirectional Communication:
The send()
method enables sophisticated generator patterns:
Communication Flow:
- Generator yields initial value
- Caller processes value and sends response
- Generator receives response and continues
- Process repeats until generator completes
Advanced Patterns:
- Coroutines: Cooperative multitasking
- State Machines: Event-driven processing
- Parser Generators: Token-based parsing
- Async Simulation: Simulating asynchronous operations
This bidirectional capability transforms generators from passive data sources into active participants in complex algorithms.
Generator Pipeline Processing
<?php
/**
* Data processing pipeline using generators
*
* This demonstrates how generators can be chained together
* to create efficient data processing pipelines.
*/
function dataSource(): Generator
{
$data = [
['name' => 'John', 'age' => 25, 'salary' => 50000],
['name' => 'Jane', 'age' => 30, 'salary' => 75000],
['name' => 'Bob', 'age' => 35, 'salary' => 60000],
['name' => 'Alice', 'age' => 28, 'salary' => 80000],
];
foreach ($data as $record) {
yield $record;
}
}
function filterByAge(Generator $input, int $minAge): Generator
{
foreach ($input as $record) {
if ($record['age'] >= $minAge) {
yield $record;
}
}
}
function transformSalary(Generator $input, float $multiplier): Generator
{
foreach ($input as $record) {
$record['salary'] *= $multiplier;
yield $record;
}
}
function formatOutput(Generator $input): Generator
{
foreach ($input as $record) {
yield sprintf(
"%s (age %d) earns $%s",
$record['name'],
$record['age'],
number_format($record['salary'])
);
}
}
// Create processing pipeline
$pipeline = formatOutput(
transformSalary(
filterByAge(dataSource(), 30),
1.1 // 10% salary increase
)
);
// Process data through pipeline
foreach ($pipeline as $result) {
echo $result . "\n";
}
?>
Pipeline Architecture Benefits:
Composability:
- Each stage has a single responsibility
- Stages can be mixed and matched
- New stages integrate seamlessly
- Testing individual stages is straightforward
Memory Efficiency:
- Only one record exists in memory at any time
- Pipeline length doesn't affect memory usage
- Can process infinite streams
- Early termination is possible
Flexibility:
- Stages can be conditionally included
- Parameters allow runtime configuration
- Error handling can be centralized
- Progress tracking is simple to add
This pattern enables building complex data processing systems from simple, reusable components.
Performance Considerations
When to Use Generators
Large Datasets: When processing datasets that might not fit in memory.
- Log file analysis
- Database exports
- CSV processing
- API response streaming
Sequential Processing: When you need to process items one at a time in sequence.
- ETL pipelines
- Data validation
- Stream processing
- Real-time analytics
Lazy Evaluation: When expensive computations should only be performed when needed.
- Search algorithms
- Mathematical sequences
- Configuration loading
- Resource initialization
Streaming Data: When processing data streams or real-time feeds.
- WebSocket messages
- Server-sent events
- File uploads
- Network protocols
When NOT to Use Generators
Random Access: When you need to access items by index or in random order.
- Sorting algorithms
- Binary search
- Array shuffling
- Lookup tables
Multiple Iterations: Generators can only be iterated once; rewind requires recreation.
- Caching results
- Multiple passes
- Backtracking algorithms
- Cross-referencing
Small Datasets: For small arrays, the overhead might not be worth the complexity.
- Configuration arrays
- Menu items
- Form options
- Static lists
Caching Results: When you need to store and reuse computed values.
- Memoization
- Result caching
- Precomputed values
- Lookup optimization
Best Practices
<?php
/**
* Generator best practices and common patterns
*/
// 1. Always specify return type for clarity
function numbersGenerator(int $max): Generator
{
for ($i = 1; $i <= $max; $i++) {
yield $i;
}
}
// 2. Handle errors gracefully
function safeFileReader(string $filename): Generator
{
if (!file_exists($filename)) {
throw new InvalidArgumentException("File not found: $filename");
}
$handle = fopen($filename, 'r');
try {
while (($line = fgets($handle)) !== false) {
yield trim($line);
}
} finally {
if ($handle) {
fclose($handle);
}
}
}
// 3. Document generator behavior
/**
* Generate Fibonacci sequence up to a maximum value
*
* @param int $max Maximum value to generate
* @return Generator<int> Yields Fibonacci numbers
*/
function fibonacci(int $max): Generator
{
$a = 0;
$b = 1;
while ($a <= $max) {
yield $a;
[$a, $b] = [$b, $a + $b];
}
}
// 4. Use early termination when appropriate
function searchGenerator(array $data, $searchValue): Generator
{
foreach ($data as $key => $value) {
if ($value === $searchValue) {
yield $key => $value;
return; // Stop after first match
}
}
}
?>
Best Practice Guidelines:
Type Safety:
- Always declare Generator return type
- Use parameter type hints
- Document yielded value types
- Consider creating value objects
Resource Management:
- Use try-finally for cleanup
- Close file handles and connections
- Free resources promptly
- Handle interruptions gracefully
Error Handling:
- Validate inputs before processing
- Throw meaningful exceptions
- Document error conditions
- Provide recovery mechanisms
Documentation:
- Explain what values are yielded
- Document performance characteristics
- Provide usage examples
- Note any limitations
Following these practices ensures generators are reliable, maintainable, and performant components of your application.
Related Topics
For more advanced PHP features:
- PHP Anonymous Functions - Closures and lambda functions
- PHP Iterators - Custom iterator implementations
- PHP Memory Management - Memory optimization techniques
- PHP Performance - Performance optimization strategies
- PHP Streams - Working with data streams
Summary
PHP generators provide powerful capabilities for memory-efficient data processing:
Memory Efficiency: Process large datasets with constant memory usage regardless of size.
Lazy Evaluation: Compute values only when needed, improving performance for selective processing.
Simplified Code: Easier to write and understand compared to custom Iterator implementations.
Flexible Patterns: Support delegation, two-way communication, and pipeline processing.
Real-World Applications: Ideal for file processing, database operations, and data transformation.
Key benefits include reduced memory footprint, improved performance for large datasets, and cleaner, more maintainable code. Generators are particularly valuable when working with data that doesn't fit comfortably in memory or when implementing streaming data processing pipelines.
Remember that generators are forward-only iterators and should be used when sequential processing is sufficient for your use case.