1. java
  2. /advanced
  3. /streams

Master Java Streams API for Functional Programming

Java Streams API

The Streams API, introduced in Java 8, revolutionized how developers work with collections and data processing in Java. It provides a functional programming approach to manipulating collections of objects, enabling you to write more readable, maintainable, and efficient code.

What are Streams?

A Stream in Java is not a data structure itself, but rather a sequence of elements from a source that supports aggregate operations. Think of streams as pipelines through which data flows and gets transformed. Unlike collections, streams don't store data - they process it.

Key Characteristics of Streams:

  1. No Storage: Streams don't store elements; they convey elements from a source
  2. Functional: Operations on streams produce results without modifying the source
  3. Lazy Evaluation: Intermediate operations are not executed until a terminal operation is invoked
  4. Possibly Unbounded: Streams can be finite or infinite
  5. Consumable: Stream elements are consumed only once, like an iterator

Stream vs Collection Analogy: Think of a collection as a DVD containing all the movie data, while a stream is like Netflix streaming - you access the content on-demand without storing it locally.

import java.util.*;
import java.util.stream.*;

public class StreamBasics {
    public static void main(String[] args) {
        List<String> names = Arrays.asList("Alice", "Bob", "Charlie", "David");
        
        // Traditional approach - imperative style
        List<String> longNamesOld = new ArrayList<>();
        for (String name : names) {
            if (name.length() > 4) {
                longNamesOld.add(name.toUpperCase());
            }
        }
        System.out.println("Traditional: " + longNamesOld);
        
        // Stream approach - functional style
        List<String> longNamesNew = names.stream()
            .filter(name -> name.length() > 4)  // Keep names longer than 4 chars
            .map(String::toUpperCase)           // Convert to uppercase
            .collect(Collectors.toList());      // Collect results
        
        System.out.println("Streams: " + longNamesNew);
    }
}

Creating Streams

Understanding how to create streams is fundamental to using the API effectively. Java provides multiple ways to create streams from different sources.

From Collections

The most common way is creating streams from existing collections:

public class StreamCreation {
    public static void main(String[] args) {
        // From List
        List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
        Stream<Integer> streamFromList = numbers.stream();
        
        // From Array
        String[] words = {"hello", "world", "java", "streams"};
        Stream<String> streamFromArray = Arrays.stream(words);
        
        // From Set
        Set<String> uniqueWords = new HashSet<>(Arrays.asList("apple", "banana", "cherry"));
        Stream<String> streamFromSet = uniqueWords.stream();
        
        // Parallel streams for potentially better performance
        Stream<Integer> parallelStream = numbers.parallelStream();
    }
}

Using Stream Builder Methods

Java provides several static methods to create streams programmatically:

public class StreamBuilders {
    public static void main(String[] args) {
        // Empty stream
        Stream<String> empty = Stream.empty();
        
        // Stream with specific elements
        Stream<String> fruits = Stream.of("apple", "banana", "cherry");
        
        // Generate infinite stream with supplier
        Stream<Double> randomNumbers = Stream.generate(Math::random);
        
        // Generate infinite stream with seed and function
        Stream<Integer> evenNumbers = Stream.iterate(0, n -> n + 2);
        
        // Finite stream with limit
        List<Integer> first10Even = Stream.iterate(0, n -> n + 2)
            .limit(10)
            .collect(Collectors.toList());
        
        System.out.println("First 10 even numbers: " + first10Even);
        
        // Stream from range (primitive streams)
        IntStream range = IntStream.range(1, 6); // 1,2,3,4,5
        IntStream rangeClosed = IntStream.rangeClosed(1, 5); // 1,2,3,4,5
        
        range.forEach(System.out::print); // Prints: 12345
        System.out.println();
        rangeClosed.forEach(System.out::print); // Prints: 12345
    }
}

Stream Operations

Stream operations are divided into two categories: intermediate and terminal operations. Understanding this distinction is crucial for mastering streams.

Intermediate Operations

Intermediate operations transform streams into other streams. They are lazy - they don't execute until a terminal operation is called. This allows for optimization and efficient processing.

Common Intermediate Operations:

  • filter() - excludes elements that don't match a predicate
  • map() - transforms elements using a function
  • flatMap() - flattens nested structures
  • distinct() - removes duplicates
  • sorted() - sorts elements
  • limit() - limits the number of elements
  • skip() - skips elements
import java.util.*;
import java.util.stream.*;

public class IntermediateOperations {
    public static void main(String[] args) {
        List<String> words = Arrays.asList("apple", "banana", "cherry", "date", "elderberry");
        
        // Filter: Keep only words longer than 5 characters
        Stream<String> longWords = words.stream()
            .filter(word -> word.length() > 5);
        
        // Map: Transform to uppercase
        Stream<String> upperWords = words.stream()
            .map(String::toUpperCase);
        
        // Map: Transform to word lengths
        Stream<Integer> wordLengths = words.stream()
            .map(String::length);
        
        // Chaining operations - this is where streams shine!
        List<String> result = words.stream()
            .filter(word -> word.length() > 4)    // Keep words longer than 4
            .map(String::toUpperCase)             // Convert to uppercase
            .sorted()                             // Sort alphabetically
            .collect(Collectors.toList());        // Terminal operation
        
        System.out.println("Processed words: " + result);
        
        // Distinct: Remove duplicates
        List<String> duplicates = Arrays.asList("apple", "banana", "apple", "cherry", "banana");
        List<String> unique = duplicates.stream()
            .distinct()
            .collect(Collectors.toList());
        System.out.println("Unique words: " + unique);
        
        // Limit and Skip: Pagination-like operations
        List<Integer> numbers = IntStream.rangeClosed(1, 20)
            .boxed()
            .skip(5)                              // Skip first 5 numbers
            .limit(10)                            // Take next 10 numbers
            .collect(Collectors.toList());
        System.out.println("Numbers 6-15: " + numbers);
    }
}

Terminal Operations

Terminal operations consume the stream to produce a final result. Once a terminal operation is executed, the stream is consumed and cannot be reused.

Common Terminal Operations:

  • collect() - gathers elements into a collection
  • forEach() - performs an action on each element
  • reduce() - combines elements into a single result
  • count() - returns the number of elements
  • anyMatch(), allMatch(), noneMatch() - test predicates
  • findFirst(), findAny() - find elements
public class TerminalOperations {
    public static void main(String[] args) {
        List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
        
        // forEach: Execute action on each element
        System.out.print("Numbers: ");
        numbers.stream().forEach(n -> System.out.print(n + " "));
        System.out.println();
        
        // collect: Gather into collection
        List<Integer> evenNumbers = numbers.stream()
            .filter(n -> n % 2 == 0)
            .collect(Collectors.toList());
        System.out.println("Even numbers: " + evenNumbers);
        
        // reduce: Combine elements
        Optional<Integer> sum = numbers.stream()
            .reduce((a, b) -> a + b);
        sum.ifPresent(s -> System.out.println("Sum: " + s));
        
        // Alternative reduce with identity
        int sum2 = numbers.stream()
            .reduce(0, Integer::sum);
        System.out.println("Sum (with identity): " + sum2);
        
        // count: Count elements
        long evenCount = numbers.stream()
            .filter(n -> n % 2 == 0)
            .count();
        System.out.println("Count of even numbers: " + evenCount);
        
        // Match operations
        boolean hasEven = numbers.stream().anyMatch(n -> n % 2 == 0);
        boolean allPositive = numbers.stream().allMatch(n -> n > 0);
        boolean noneNegative = numbers.stream().noneMatch(n -> n < 0);
        
        System.out.println("Has even: " + hasEven);
        System.out.println("All positive: " + allPositive);
        System.out.println("None negative: " + noneNegative);
        
        // Find operations
        Optional<Integer> firstEven = numbers.stream()
            .filter(n -> n % 2 == 0)
            .findFirst();
        firstEven.ifPresent(n -> System.out.println("First even: " + n));
    }
}

Advanced Stream Operations

FlatMap - Flattening Nested Structures

The flatMap operation is particularly powerful when dealing with nested collections or when you need to transform each element into multiple elements.

public class FlatMapExample {
    public static void main(String[] args) {
        // Example 1: Flattening lists of lists
        List<List<String>> listOfLists = Arrays.asList(
            Arrays.asList("a", "b"),
            Arrays.asList("c", "d", "e"),
            Arrays.asList("f", "g", "h")
        );
        
        // Without flatMap - this gives you Stream<List<String>>
        // With flatMap - this gives you Stream<String>
        List<String> flattenedList = listOfLists.stream()
            .flatMap(Collection::stream)
            .collect(Collectors.toList());
        
        System.out.println("Flattened: " + flattenedList);
        
        // Example 2: Words to characters
        List<String> words = Arrays.asList("hello", "world");
        List<String> characters = words.stream()
            .flatMap(word -> word.chars()
                .mapToObj(c -> String.valueOf((char) c)))
            .collect(Collectors.toList());
        
        System.out.println("Characters: " + characters);
        
        // Example 3: Processing nested data
        List<Department> departments = Arrays.asList(
            new Department("IT", Arrays.asList("Alice", "Bob")),
            new Department("HR", Arrays.asList("Charlie", "David")),
            new Department("Finance", Arrays.asList("Eve"))
        );
        
        List<String> allEmployees = departments.stream()
            .flatMap(dept -> dept.getEmployees().stream())
            .collect(Collectors.toList());
        
        System.out.println("All employees: " + allEmployees);
    }
    
    static class Department {
        private String name;
        private List<String> employees;
        
        public Department(String name, List<String> employees) {
            this.name = name;
            this.employees = employees;
        }
        
        public List<String> getEmployees() { return employees; }
    }
}

Collectors - Gathering Results

The Collectors utility class provides many pre-built collectors for common operations. Understanding collectors is essential for effectively using streams.

public class CollectorsExample {
    public static void main(String[] args) {
        List<Person> people = Arrays.asList(
            new Person("Alice", 30, "Engineering"),
            new Person("Bob", 25, "Marketing"),
            new Person("Charlie", 35, "Engineering"),
            new Person("David", 28, "Marketing"),
            new Person("Eve", 32, "Sales")
        );
        
        // Basic collectors
        List<String> names = people.stream()
            .map(Person::getName)
            .collect(Collectors.toList());
        
        Set<String> departments = people.stream()
            .map(Person::getDepartment)
            .collect(Collectors.toSet());
        
        // Joining strings
        String allNames = people.stream()
            .map(Person::getName)
            .collect(Collectors.joining(", "));
        System.out.println("All names: " + allNames);
        
        // Grouping by department
        Map<String, List<Person>> byDepartment = people.stream()
            .collect(Collectors.groupingBy(Person::getDepartment));
        
        System.out.println("Grouped by department:");
        byDepartment.forEach((dept, persons) -> {
            System.out.println(dept + ": " + 
                persons.stream().map(Person::getName).collect(Collectors.toList()));
        });
        
        // Counting by department
        Map<String, Long> countByDepartment = people.stream()
            .collect(Collectors.groupingBy(
                Person::getDepartment,
                Collectors.counting()
            ));
        System.out.println("Count by department: " + countByDepartment);
        
        // Average age by department
        Map<String, Double> avgAgeByDepartment = people.stream()
            .collect(Collectors.groupingBy(
                Person::getDepartment,
                Collectors.averagingDouble(Person::getAge)
            ));
        System.out.println("Average age by department: " + avgAgeByDepartment);
        
        // Partitioning (binary grouping)
        Map<Boolean, List<Person>> partitionedByAge = people.stream()
            .collect(Collectors.partitioningBy(person -> person.getAge() >= 30));
        
        System.out.println("30 and older: " + 
            partitionedByAge.get(true).stream()
                .map(Person::getName)
                .collect(Collectors.toList()));
    }
    
    static class Person {
        private String name;
        private int age;
        private String department;
        
        public Person(String name, int age, String department) {
            this.name = name;
            this.age = age;
            this.department = department;
        }
        
        public String getName() { return name; }
        public int getAge() { return age; }
        public String getDepartment() { return department; }
    }
}

Primitive Streams

Java provides specialized stream classes for primitive types (IntStream, LongStream, DoubleStream) to avoid the overhead of boxing and unboxing.

public class PrimitiveStreams {
    public static void main(String[] args) {
        // IntStream examples
        IntStream numbers = IntStream.rangeClosed(1, 10);
        
        // Basic statistics
        IntSummaryStatistics stats = numbers.summaryStatistics();
        System.out.println("Count: " + stats.getCount());
        System.out.println("Sum: " + stats.getSum());
        System.out.println("Average: " + stats.getAverage());
        System.out.println("Min: " + stats.getMin());
        System.out.println("Max: " + stats.getMax());
        
        // Converting between streams
        IntStream intStream = IntStream.range(1, 6);
        Stream<Integer> boxedStream = intStream.boxed(); // int -> Integer
        
        Stream<String> stringStream = Stream.of("1", "2", "3", "4", "5");
        IntStream parsedStream = stringStream.mapToInt(Integer::parseInt);
        
        // DoubleStream for calculations
        double[] values = {1.5, 2.3, 3.7, 4.1, 5.9};
        double average = Arrays.stream(values)
            .average()
            .orElse(0.0);
        System.out.println("Average: " + average);
        
        // Performance comparison
        long startTime = System.nanoTime();
        
        // Using boxed integers (slower)
        Stream.iterate(1, n -> n + 1)
            .limit(1_000_000)
            .mapToInt(Integer::intValue)
            .sum();
        
        long middleTime = System.nanoTime();
        
        // Using primitive stream (faster)
        IntStream.range(1, 1_000_001)
            .sum();
        
        long endTime = System.nanoTime();
        
        System.out.println("Boxed stream time: " + (middleTime - startTime) / 1_000_000 + " ms");
        System.out.println("Primitive stream time: " + (endTime - middleTime) / 1_000_000 + " ms");
    }
}

Parallel Streams

Parallel streams can potentially improve performance for CPU-intensive operations by utilizing multiple CPU cores. However, they should be used judiciously.

public class ParallelStreams {
    public static void main(String[] args) {
        List<Integer> largeList = IntStream.rangeClosed(1, 10_000_000)
            .boxed()
            .collect(Collectors.toList());
        
        // Sequential processing
        long startTime = System.nanoTime();
        long sequentialSum = largeList.stream()
            .mapToLong(Integer::longValue)
            .sum();
        long sequentialTime = System.nanoTime() - startTime;
        
        // Parallel processing
        startTime = System.nanoTime();
        long parallelSum = largeList.parallelStream()
            .mapToLong(Integer::longValue)
            .sum();
        long parallelTime = System.nanoTime() - startTime;
        
        System.out.println("Sequential result: " + sequentialSum);
        System.out.println("Parallel result: " + parallelSum);
        System.out.println("Sequential time: " + sequentialTime / 1_000_000 + " ms");
        System.out.println("Parallel time: " + parallelTime / 1_000_000 + " ms");
        System.out.println("Speedup: " + (double) sequentialTime / parallelTime);
        
        // When NOT to use parallel streams
        demonstrateParallelPitfalls();
    }
    
    private static void demonstrateParallelPitfalls() {
        List<String> words = Arrays.asList("apple", "banana", "cherry", "date");
        
        // DON'T: Using parallel streams with synchronized collections
        List<String> synchronizedList = Collections.synchronizedList(new ArrayList<>());
        words.parallelStream()
            .forEach(synchronizedList::add); // This negates parallel benefits
        
        // DON'T: For small datasets
        List<Integer> smallList = Arrays.asList(1, 2, 3, 4, 5);
        // Overhead of parallel processing exceeds benefits
        int sum = smallList.parallelStream()
            .mapToInt(Integer::intValue)
            .sum();
        
        // DON'T: When order matters and operations aren't associative
        String incorrectResult = words.parallelStream()
            .reduce("", (a, b) -> a + " " + b); // Order not guaranteed!
        System.out.println("Potentially incorrect concatenation: " + incorrectResult);
        
        // DO: Use for CPU-intensive operations on large datasets
        List<Double> calculations = IntStream.range(1, 1000)
            .parallel()
            .mapToDouble(ParallelStreams::expensiveCalculation)
            .boxed()
            .collect(Collectors.toList());
    }
    
    private static double expensiveCalculation(int n) {
        // Simulate expensive computation
        return Math.sqrt(Math.sin(n) * Math.cos(n));
    }
}

Real-World Examples

Data Processing Pipeline

public class DataProcessingExample {
    public static void main(String[] args) {
        // Simulate reading sales data from a file or database
        List<Sale> sales = generateSampleData();
        
        // Business requirement: Find top 5 customers by total purchase amount
        // in the last quarter, excluding cancelled orders
        
        List<CustomerSummary> topCustomers = sales.stream()
            .filter(sale -> sale.getStatus() != SaleStatus.CANCELLED)
            .filter(sale -> sale.getDate().isAfter(LocalDate.now().minusMonths(3)))
            .collect(Collectors.groupingBy(
                Sale::getCustomerId,
                Collectors.summingDouble(Sale::getAmount)
            ))
            .entrySet()
            .stream()
            .map(entry -> new CustomerSummary(entry.getKey(), entry.getValue()))
            .sorted(Comparator.comparing(CustomerSummary::getTotalAmount).reversed())
            .limit(5)
            .collect(Collectors.toList());
        
        System.out.println("Top 5 customers:");
        topCustomers.forEach(customer -> 
            System.out.printf("Customer %s: $%.2f%n", 
                customer.getCustomerId(), customer.getTotalAmount()));
        
        // Another requirement: Monthly sales summary
        Map<String, DoubleSummaryStatistics> monthlySummary = sales.stream()
            .filter(sale -> sale.getStatus() == SaleStatus.COMPLETED)
            .collect(Collectors.groupingBy(
                sale -> sale.getDate().getMonth().toString(),
                Collectors.summarizingDouble(Sale::getAmount)
            ));
        
        System.out.println("\nMonthly Sales Summary:");
        monthlySummary.forEach((month, stats) ->
            System.out.printf("%s: Count=%d, Total=$%.2f, Average=$%.2f%n",
                month, stats.getCount(), stats.getSum(), stats.getAverage()));
    }
    
    private static List<Sale> generateSampleData() {
        // Sample data generation
        return Arrays.asList(
            new Sale("CUST001", 150.0, LocalDate.of(2024, 1, 15), SaleStatus.COMPLETED),
            new Sale("CUST002", 200.0, LocalDate.of(2024, 1, 20), SaleStatus.COMPLETED),
            new Sale("CUST001", 75.0, LocalDate.of(2024, 2, 10), SaleStatus.CANCELLED),
            new Sale("CUST003", 300.0, LocalDate.of(2024, 2, 25), SaleStatus.COMPLETED),
            new Sale("CUST001", 125.0, LocalDate.of(2024, 3, 5), SaleStatus.COMPLETED)
        );
    }
    
    // Supporting classes
    static class Sale {
        private String customerId;
        private double amount;
        private LocalDate date;
        private SaleStatus status;
        
        public Sale(String customerId, double amount, LocalDate date, SaleStatus status) {
            this.customerId = customerId;
            this.amount = amount;
            this.date = date;
            this.status = status;
        }
        
        // Getters
        public String getCustomerId() { return customerId; }
        public double getAmount() { return amount; }
        public LocalDate getDate() { return date; }
        public SaleStatus getStatus() { return status; }
    }
    
    enum SaleStatus { COMPLETED, CANCELLED, PENDING }
    
    static class CustomerSummary {
        private String customerId;
        private double totalAmount;
        
        public CustomerSummary(String customerId, double totalAmount) {
            this.customerId = customerId;
            this.totalAmount = totalAmount;
        }
        
        public String getCustomerId() { return customerId; }
        public double getTotalAmount() { return totalAmount; }
    }
}

Best Practices and Performance Considerations

When to Use Streams

Use Streams When:

  • Processing collections with multiple transformations
  • You need readable, functional-style code
  • Complex filtering and mapping operations
  • Grouping and aggregating data
  • You want to easily switch between sequential and parallel processing

Avoid Streams When:

  • Simple iterations (traditional for-each is clearer)
  • Modifying existing collections (use traditional loops)
  • Early termination with complex conditions
  • Memory is a critical concern (streams have some overhead)

Performance Tips

public class StreamPerformanceTips {
    public static void main(String[] args) {
        List<String> largeList = generateLargeStringList();
        
        // TIP 1: Use primitive streams when possible
        // BAD: Boxing overhead
        long sum1 = IntStream.range(1, 1000)
            .boxed()
            .mapToLong(Integer::longValue)
            .sum();
        
        // GOOD: No boxing
        long sum2 = IntStream.range(1, 1000)
            .asLongStream()
            .sum();
        
        // TIP 2: Filter early to reduce downstream processing
        // BAD: Processing all elements first
        long count1 = largeList.stream()
            .map(String::toUpperCase)
            .filter(s -> s.startsWith("A"))
            .count();
        
        // GOOD: Filter first
        long count2 = largeList.stream()
            .filter(s -> s.toUpperCase().startsWith("A"))
            .count();
        
        // TIP 3: Use parallel streams for CPU-intensive operations
        // on large datasets
        if (largeList.size() > 10_000) {
            List<String> processed = largeList.parallelStream()
                .filter(s -> s.length() > 5)
                .map(String::toUpperCase)
                .collect(Collectors.toList());
        }
        
        // TIP 4: Avoid unnecessary intermediate collections
        // BAD: Creating intermediate list
        List<String> intermediate = largeList.stream()
            .filter(s -> s.length() > 3)
            .collect(Collectors.toList());
        long finalCount = intermediate.stream()
            .filter(s -> s.startsWith("B"))
            .count();
        
        // GOOD: Single stream pipeline
        long finalCount2 = largeList.stream()
            .filter(s -> s.length() > 3)
            .filter(s -> s.startsWith("B"))
            .count();
    }
    
    private static List<String> generateLargeStringList() {
        // Generate sample data
        return IntStream.range(1, 10000)
            .mapToObj(i -> "String" + i)
            .collect(Collectors.toList());
    }
}

Summary

The Java Streams API represents a paradigm shift toward functional programming in Java. Key takeaways:

Core Concepts:

  • Streams are pipelines: They process data flow, not storage
  • Lazy evaluation: Intermediate operations wait for terminal operations
  • Functional approach: Emphasizes what to do, not how to do it
  • Immutability: Operations don't modify the source

Key Benefits:

  • Readability: Declarative code is often more expressive
  • Composability: Easy to chain operations
  • Parallelization: Simple switch to parallel processing
  • Optimization: JVM can optimize stream pipelines

When to Use:

  • Complex data transformations
  • Filtering and mapping operations
  • Grouping and aggregating
  • When readability and maintainability matter

Performance Considerations:

  • Use primitive streams for numeric operations
  • Filter early in the pipeline
  • Consider parallel streams for large datasets and CPU-intensive operations
  • Be mindful of the overhead for simple operations

The Streams API, combined with lambda expressions and method references, enables you to write more expressive, maintainable, and often more efficient code for data processing tasks in Java.