Oracle Log Writer (LGWR) Process: In-Depth Technical Analysis

bparlayan

10 hours ago

1. Oracle Log Writer (LGWR): Definition and Critical Role

The Oracle Log Writer (LGWR) is an essential, mandatory background process vital for the fundamental operation of an Oracle database instance. In the Oracle architecture, any change made to persistent storage areas like database blocks, password files, or control files is first recorded as “redo log entries” (change vectors) in a memory area within the System Global Area (SGA) called the Redo Log Buffer. LGWR’s primary and critical task is to take these redo log entries from the Redo Log Buffer and write them permanently to the Online Redo Log (ORL) files located on disk. This process works in conjunction with other core background processes like DBWn (Database Writer), SMON (System Monitor), PMON (Process Monitor), and CKPT (Checkpoint) to ensure database integrity and consistency.

This function of LGWR is the cornerstone of ensuring Oracle database recoverability and durability. In the event of a system failure, such as an instance crash or power outage, changes resulting from committed transactions that haven’t yet been written to data files can be recovered using the information LGWR wrote to the online redo log files (instance recovery). This recovery is typically managed by the SMON process. LGWR’s timely and reliable writing of redo log entries to disk is crucial for preventing data loss and guaranteeing the “Durability” principle of ACID (Atomicity, Consistency, Isolation, Durability). Durability ensures that once a transaction is committed, its changes are permanent and protected against system failures; LGWR is central to this guarantee.

The LGWR process can run as an operating system process or as a thread within an OS process, depending on the operating system. For instance, it’s typically a separate process on Linux/UNIX systems but might be a thread in Windows environments. LGWR operates not only in database instances but also in Oracle Automatic Storage Management (Oracle ASM) instances.

The reliable and efficient operation of LGWR is a prerequisite for Oracle’s fundamental data protection and consistency mechanisms. Even the DBWn process, responsible for writing data blocks to data files, must wait for LGWR to permanently write the corresponding redo information to the ORL files (enforcing the Write-Ahead Logging – WAL protocol). This dependency highlights LGWR’s central role in Oracle’s data integrity strategy.

2. LGWR’s Primary Function: Redo Log Persistence

LGWR’s main responsibility is to sequentially write redo log entries from the Redo Log Buffer to the active Online Redo Log file or group on disk. These entries contain the change vectors necessary to reconstruct every modification made to the database (INSERT, UPDATE, DELETE, DDL operations, etc.).

Redo Log Buffer (RLB): A circular buffer memory area within the SGA. Server processes write redo entries corresponding to user transaction changes into this buffer very quickly. Its circular nature allows new entries to overwrite entries already written to disk by LGWR, enabling efficient memory usage.
Online Redo Log (ORL) Files: Physical files on disk. A database typically has multiple ORL groups, and each group can contain multiple member files (multiplexing) for redundancy and protection against disk failures. LGWR writes the same redo information concurrently to all members of the active redo log group.
Sequential Write Mechanism: LGWR writes redo entries from the buffer to the online redo log files in the order they were received. This sequential write is generally more efficient than random I/O, especially on traditional spinning disks (HDDs), as it minimizes disk head movement. However, the primary factor influencing LGWR performance is often the write latency, which is critical for COMMIT operations. While the performance difference between sequential and random writes is less pronounced on modern storage like SSDs and NVMe, write latency remains the key metric for LGWR performance.
Log Writer Workers (LGnn): In modern Oracle versions, the main LGWR process coordinates tasks like redo log buffer management, while helper processes or threads called Log Writer Workers (LGnn, e.g., LG00-LG99) perform the actual writing of redo data to disk and post completion status back to waiting server processes. This division of labor allows for parallelization of LGWR operations, especially on multi-CPU systems, improving overall throughput and preventing the main LGWR process from becoming a bottleneck.
Redo Transport Workers (TTnn): Distinct from LGWR’s local write function, Redo Transport Worker (TTnn) processes are responsible for asynchronously shipping redo data to remote standby databases in Oracle Data Guard configurations. While not directly involved in LGWR’s primary local write task, they are part of the overall redo management architecture.

3. LGWR Write Trigger Mechanisms

LGWR does not continuously write data from the Redo Log Buffer to disk. Instead, it is triggered by specific events or conditions designed to balance performance (avoiding unnecessary I/O) with data integrity and recoverability :

COMMIT Operation: When a user transaction issues a COMMIT, all redo log entries generated by that transaction must be written permanently to disk. This ensures the transaction’s durability. The user process (or server process) waits until it receives confirmation from LGWR that the relevant redo entries have been successfully written to the ORL files. This wait time is the primary cause of the common log file sync wait event in Oracle performance analysis.
Log Buffer Thresholds Exceeded: LGWR proactively triggers a write when the Redo Log Buffer reaches a certain occupancy level or accumulates a specific amount of redo data. This helps prevent the buffer from filling completely and stalling new redo generation (log buffer space wait event). Common thresholds include:
- The buffer becoming one-third (1/3) full.
- Accumulation of a specific amount (e.g., 1 Megabyte (MB)) of redo data in the buffer. This threshold relates to Oracle’s internal _log_io_size mechanism, but it’s a triggering threshold, not a user-configurable I/O size parameter. (Note: These thresholds might be dynamically adjusted based on Oracle version, platform, or LOG_BUFFER size.)
Timeout: Approximately every three seconds, LGWR periodically writes the current contents of the Redo Log Buffer to disk. This ensures that redo entries are flushed regularly, even if no commits occur or buffer thresholds aren’t met. This limits the amount of redo data that could potentially be lost in an instance crash (though only committed data is guaranteed recoverable).
DBWR Signal (Write-Ahead Logging – WAL): Before the Database Writer (DBWR or DBWn) process can write modified (dirty) data blocks from the database buffer cache to the permanent data files, it must ensure that all redo entries corresponding to those changes have been written to the online redo log files on disk. This is the fundamental rule of the Write-Ahead Logging (WAL) protocol. If DBWn needs to write a block whose redo entries are still only in the Redo Log Buffer, DBWn signals LGWR to write those redo entries and waits for LGWR to complete the write before proceeding.

These diverse triggers balance data durability, recoverability, and system performance. COMMIT ensures transaction durability; thresholds and timeouts manage buffer flow and limit potential loss; the DBWR signal enforces the critical WAL protocol for data consistency. The DBWR-LGWR dependency imposed by WAL means that LGWR performance can impact not only COMMIT latency but also the speed of writing to data files and thus buffer cache management, potentially leading to waits like free buffer waits if LGWR is slow.

The table below summarizes these triggers:

Trigger Event	Primary Purpose	Mechanism	Synchronization for User
`COMMIT` Operation	Durability	User/Server process posts LGWR	Synchronous (`log file sync` wait)
Buffer 1/3 Full Threshold	Throughput/Efficiency	Internal check by LGWR	Asynchronous
Buffer 1MB Data Threshold	Throughput/Efficiency	Internal check by LGWR	Asynchronous
~3 Second Timeout	Bounded Recovery	Timer within LGWR	Asynchronous
DBWR Signal (WAL)	Consistency (WAL)	DBWR process posts LGWR	Asynchronous (but DBWR process waits)

4. LGWR Interactions and Log Management

LGWR interacts closely with other components of Oracle’s logging and recovery architecture:

Log Switch: LGWR writes to only one active ORL group at a time. When this group fills, Oracle automatically performs a log switch. During this, LGWR pauses writing, the CKPT process updates control files and data file headers, and LGWR begins writing to the next available ORL group. This process continues cyclically. Log switches can also be triggered manually (ALTER SYSTEM SWITCH LOGFILE;).
Multiplexing and Error Handling: To minimize data loss risk and improve resilience against I/O errors, it’s strongly recommended that each ORL group has multiple member files (multiplexing), preferably on different physical disks and controllers. LGWR writes concurrently and in parallel to all members of the active group. If one member becomes corrupted or inaccessible, LGWR can continue writing to the other members and logs an error in the alert.log and LGWR trace file. However, if all members of the active group become unavailable, or if all members of the next group are inaccessible during a switch, LGWR stops, potentially halting database operations.
Archiver (ARCH/ARCn) Process Interaction: If the database is in ARCHIVELOG mode (standard for production environments, required for point-in-time recovery), the ARCH (or ARCn) processes are triggered when a log switch occurs. ARCH processes copy the filled ORL group to designated archived redo log destinations. These archived logs are essential for media recovery (after disk failure) and point-in-time recovery (PITR). LGWR’s direct interaction with ARCH occurs when LGWR needs to switch to the next ORL group, but that group hasn’t yet been archived by ARCH. In this situation, LGWR must wait for ARCH to complete, which can lead to “Checkpoint not complete” or “Cannot allocate new log” errors and performance issues. Therefore, configuring a sufficient number and size of ORL groups and ensuring archiving can keep pace with redo generation is crucial.

These interactions ensure the integrity of Oracle’s recovery architecture. Log switching provides redo stream continuity, multiplexing protects against single disk failures, and ARCH processes enable long-term retention and advanced recovery scenarios. However, these interactions can also become wait points if ORL sizing or archiving performance is not configured correctly.

5. LGWR Performance: I/O Dependency and Bottlenecks

The performance of the LGWR process is heavily dependent on the I/O performance of the disk subsystem hosting the online redo log (ORL) files, particularly the write latency. Although LGWR performs sequential writes , the synchronous nature of COMMIT operations means the time taken to complete each write (latency) is often more critical than overall throughput.

Impact of Slow Disk I/O:
- High Write Latency: If writes to ORL files take a long time to complete, LGWR slows down. This directly impacts COMMIT operations. When a user commits, the server process must wait for LGWR’s confirmation, and high disk latency extends this wait time, increasing the duration of the log file sync wait event. This is perceived directly by end-users as slower transaction response times.
- Low Throughput: If the disk subsystem cannot keep up with the volume of redo data generated by LGWR, the Redo Log Buffer can fill up quickly. Server processes trying to generate new redo entries may then have to wait for space in the buffer, resulting in log buffer space wait events. This reduces overall database throughput and performance.
log file sync Wait Event: This is one of the most common and critical wait events in Oracle performance tuning. It represents the time a session waits after issuing a COMMIT or ROLLBACK for LGWR to write the relevant redo entries to persistent storage (ORL files) and return acknowledgment. High log file sync wait times typically indicate:
- Slow disk subsystem for ORL files (especially high write latency).
- Application committing too frequently (triggering LGWR often with small transactions).
- CPU starvation or high contention for LGWR or its associated worker processes.
- I/O contention on the ORL disks from other processes (e.g., ARCH). This wait event directly impacts end-user experience, especially in OLTP systems.
Role of Fast Storage (SSD, NVMe): Placing online redo log files on storage with very low write latency, such as Solid State Drives (SSD) or Non-Volatile Memory Express (NVMe) devices, significantly reduces LGWR write latency and thus log file sync wait times. This allows COMMIT operations to complete much faster, markedly improving overall OLTP performance. Using SSD/NVMe is one of the most effective ways to resolve log file sync issues compared to traditional HDDs.

6. Methods to Improve LGWR Performance

Several strategies, spanning infrastructure and application levels, can be employed to enhance LGWR performance and consequently, COMMIT speed:

Use Fastest Storage: Place Online Redo Log (ORL) files on the storage tier with the lowest write latency available (ideally NVMe or at least enterprise-grade SSD). This is the most direct way to reduce log file sync waits.
Appropriate RAID Level and Configuration: Use RAID levels with good write performance and redundancy for ORLs, such as RAID 1 or RAID 10. Avoid RAID 5/6 due to parity calculation overhead impacting write performance. Ensure the storage controller’s write cache (especially if battery-backed) is enabled and configured correctly.
Sufficient Number and Size of ORL Groups:
- Size: Size ORL groups based on the system’s redo generation rate. Too small logs cause frequent log switches, increasing load on CKPT/ARCH and the risk of LGWR waiting for archiving. Aim for log switches every 15-30 minutes during peak load. Sizes like 1GB, 4GB, 8GB, or larger are common.
- Number: A minimum of 3 or 4 ORL groups is recommended. More groups increase the likelihood that LGWR finds a free group if ARCH is lagging.
Implement Multiplexing: Ensure each ORL group has at least two members, placed on physically separate disks and preferably different I/O controllers. This protects against single disk failures and can distribute I/O load.
Tune LOG_BUFFER Cautiously:
- This parameter sets the Redo Log Buffer size in the SGA.
- Oracle typically auto-tunes this, and defaults are often sufficient.
- Consider increasing LOG_BUFFER only if significant log buffer space waits or constantly increasing redo buffer allocation retries are observed, and the underlying I/O subsystem (log file parallel write) is fast.
- Make increases cautiously (e.g., 50% or double). An excessively large buffer (hundreds of MBs or GBs) usually yields no extra performance, wastes SGA memory, and can starve other SGA components. Values between 8MB and 256MB are typically adequate. Enabling Flashback Database might require a larger log buffer.
Optimize COMMIT Frequency (Application Level): Applications committing excessively often (e.g., per row in a loop) overload LGWR and cause high log file sync waits. Grouping operations into larger logical units of work and reducing COMMIT frequency (batch commits) can significantly improve performance.
Use NOLOGGING / UNRECOVERABLE Selectively: For specific bulk load, CTAS, or index build operations where data loss risk is acceptable (and a backup follows), using NOLOGGING or UNRECOVERABLE can drastically reduce or eliminate redo generation, easing the load on LGWR. Remember this makes these operations unrecoverable via media recovery and prevents replication to standbys. FORCE LOGGING at the database/tablespace level overrides these options.
Improve Overall Disk I/O Subsystem: Optimizing the general I/O performance of the subsystem hosting the ORLs (faster disks, better controllers, optimized filesystem settings, I/O schedulers) indirectly benefits LGWR.

7. LGWR in Exadata Environments

The Oracle Exadata platform incorporates specialized hardware and software components designed to optimize database performance, and LGWR performance benefits significantly from these optimizations.

Exadata Smart Flash Log (SFL): This is one of the most critical and effective Exadata features for improving LGWR performance.
- Mechanism: In traditional systems, LGWR writes redo directly to ORL disks (HDD/SSD) and waits for physical write completion for COMMIT acknowledgment. On Exadata, LGWR primarily writes redo entries first to the ultra-fast Flash Cache (NVMe-based flash memory) on the storage servers. Some sources suggest LGWR writes concurrently to both Flash Cache and the disk controller’s DRAM cache, with the first completion being sufficient for acknowledgment. Writing to flash has extremely low latency (microseconds). Once the write to the fast Flash Cache completes, LGWR acknowledges the COMMIT back to the database session.
- Background Write: Redo entries written to Flash Cache are subsequently written in the background and asynchronously by the Exadata storage servers to the persistent Online Redo Log files on disk (HDD or Capacity Flash).
- Benefits:
  - Dramatically reduces log file sync wait time because COMMIT acknowledgment doesn’t wait for slower disk writes; acknowledgment is based on the much faster flash write.
  - Significantly accelerates COMMIT operations and thus overall OLTP performance and throughput.
  - Effectively eliminates or greatly mitigates the disk I/O bottleneck for LGWR.
- Resource Usage: SFL uses a very small portion of the flash cache capacity (e.g., default 512 MB per cell).
Exadata RDMA Memory (XRMEM): Introduced in Exadata X8M and later generations, XRMEM is an ultra-fast cache tier based on Persistent Memory (PMEM) or DRAM within the storage servers. The Smart Flash Log mechanism leverages XRMEM for the lowest possible latency. Redo writes can be directed primarily to XRMEM, offering even lower latency than Flash Cache (e.g., 17 microseconds on X10M , 14 microseconds on X11M ). XRMEM further enhances SFL performance.
High-Speed Internal Network (RDMA over Converged Ethernet – RoCE): Exadata’s internal network connecting database and storage servers uses low-latency, high-bandwidth RDMA-enabled RoCE (or InfiniBand in older generations). This fast network allows LGWR to rapidly send redo data to XRMEM and Flash Cache on storage servers with minimal delay. RDMA enables direct memory access without OS or CPU intervention, speeding up communication.
Comparison with Non-Exadata Environments: On standard (non-Exadata) systems, LGWR writes directly to physical disks (HDD, SSD, NVMe) through OS and filesystem layers. Performance is entirely dependent on the write latency and throughput of these disks and the I/O subsystem. There’s no specialized caching and background write mechanism like Smart Flash Log. Therefore, achieving Exadata-level log file sync times on non-Exadata typically requires investing in much more expensive and faster storage hardware (usually NVMe) or implementing aggressive application-level COMMIT optimizations. Exadata addresses this bottleneck architecturally with integrated solutions like SFL and XRMEM, providing a significant performance advantage.

8. LGWR-Related Parameters

Several Oracle initialization parameters can directly or indirectly influence LGWR’s behavior. However, in most cases, default settings suffice, and direct manipulation is often discouraged.

LOG_BUFFER:
- Meaning: Defines the amount of memory allocated for the Redo Log Buffer in the SGA, specified in bytes.
- Impact: Affects how often LGWR might write (due to 1/3 full or 1MB thresholds). A buffer set too small manually might lead to frequent writes and log buffer space waits. An excessively large buffer doesn’t always improve performance, wastes SGA memory, and might starve other components. Modern Oracle versions usually auto-size this parameter (e.g., based on SGA size or CPU count, often in granule sizes). Values between 8MB and 256MB are typically sufficient. Only consider manual adjustment if AWR reports show significant log buffer space waits or high redo buffer allocation retries, and the underlying I/O (log file parallel write) is fast. Flashback Database might require a larger log buffer.
_LOG_IO_SIZE (Hidden/Internal Parameter):
- Meaning: This parameter is often associated with internal mechanisms that can influence the size of I/O requests LGWR makes to the OS. However, it is not a user-tunable parameter that directly sets the amount of data LGWR flushes per write operation. The reference in uses this term in the context of the 1MB threshold that triggers a write, not as an I/O size setting. The actual I/O size is determined dynamically based on OS block size, LOG_BUFFER size, and other internal factors.
- Impact: Manually changing hidden parameters (starting with _) is strongly discouraged and should only be done under the explicit guidance of Oracle Support. Incorrect settings can lead to severe performance issues, inconsistencies, or data corruption. LGWR’s efficiency relies on its internal algorithms grouping and writing redo optimally; manual interference is generally detrimental.
DB_WRITER_PROCESSES (Indirect Impact): The number of DBWn processes can indirectly influence how often LGWR is triggered via the DBWR signal, but it’s not a parameter used to tune LGWR itself.
ARCHIVE_LAG_TARGET (Indirect Impact): Primarily used in Data Guard, this parameter influences how often log switches should occur to meet a target data loss time (in seconds), which can indirectly affect ORL sizing and potential LGWR waits related to log switching or archiving.
FAST_START_MTTR_TARGET (Indirect Impact): This parameter sets the target instance recovery time and influences Oracle’s internal checkpoint frequency. More frequent checkpoints might lead to more frequent DBWR activity, potentially triggering LGWR more often via the WAL signal.

In summary, apart from potentially LOG_BUFFER in specific diagnosed scenarios, there are few parameters end-users should safely adjust to directly control LGWR behavior. Performance tuning should focus on optimizing the I/O subsystem (especially for ORLs), application COMMIT behavior, and leveraging platform-specific features like Exadata’s Smart Flash Log.

9. Can LGWR Write Amount Be Manually Determined?

It is not possible or recommended to manually set a fixed, specific amount of redo data that LGWR writes in each I/O operation via a parameter. The amount of data LGWR writes per flush is dynamic and depends entirely on the triggering mechanism, as detailed in Section 3:

COMMIT: LGWR must write all redo generated by the transaction up to the commit point, plus potentially other pending entries. The amount varies per transaction.
Buffer Thresholds (1/3 or 1MB): When a threshold is hit, LGWR writes the accumulated redo data currently in the buffer. The amount depends on the redo generation rate leading up to the threshold.
Timeout (~3 Seconds): LGWR writes whatever redo is currently in the buffer when the timer expires. This could be a small or large amount.
DBWR Signal: LGWR writes at least the redo required by DBWR, potentially batching other pending entries as well.

The amount written per LGWR I/O is constantly variable based on database activity, COMMIT frequency, LOG_BUFFER size, and the specific trigger.

While internal mechanisms like _LOG_IO_SIZE or potential hidden parameters might influence the size of I/O requests LGWR makes to the OS, they do not dictate the total amount of redo flushed from the buffer in response to a trigger. Attempting to manually control this dynamic, adaptive process via unsupported parameters is highly risky and likely counterproductive. LGWR’s efficiency comes from its ability to intelligently batch redo entries based on these dynamic triggers.

10. Conclusion

The Oracle Log Writer (LGWR) process is a fundamental component of the Oracle database architecture, indispensable for ensuring data durability and recoverability. By reliably writing redo log entries from the Redo Log Buffer to online redo log files, it guarantees the persistence of committed transactions and prevents data loss during instance failures.

LGWR performance is critically linked to the write latency of the underlying storage subsystem hosting the online redo log files. Slow I/O can lead to significant log file sync waits, increasing OLTP transaction times and degrading overall database performance. Optimizing LGWR performance, therefore, primarily involves deploying online redo logs on the fastest available storage (SSD/NVMe) , optimizing application COMMIT frequency , and ensuring proper ORL sizing and management.

On specialized platforms like Oracle Exadata, features such as Smart Flash Log (SFL) and XRMEM architecturally address the LGWR write bottleneck, dramatically reducing commit latency and providing a significant performance advantage over traditional systems. Understanding LGWR’s mechanisms, triggers, interactions, and performance factors is crucial for maximizing Oracle database performance while maintaining essential data integrity and recovery guarantees.