_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Optimizing Oracle Database for Large Transactional Systems

10. 04. 2012 4 min read CORE SYSTEMSdata
Optimizing Oracle Database for Large Transactional Systems

Last year we took over administration of an Oracle database for one of our clients — a large Czech financial institution. The system was processing over a million transactions per day and slowly dying. Response times were getting worse every month, and batch jobs couldn’t finish overnight. Here’s the story of how we turned it around.

Diagnosis: AWR Reports Don’t Lie

The first thing you do when an Oracle database isn’t performing as it should is look at AWR (Automatic Workload Repository) reports. In our case it was like opening a medical report — a clear picture of the problem.

Top wait events showed massive db file sequential read — a classic symptom of bad execution plans and missing indexes. Buffer cache hit ratio was at 87%, which for an OLTP system with 64 GB SGA is terrible. And shared pool free memory was hovering around 2%, meaning constant hard parses.

Step 1: Stabilizing Execution Plans

The Oracle optimizer is smart, but sometimes too smart. In production we kept seeing cases where after gather stats, execution plans changed radically — queries that ran in a second suddenly took minutes.

Solution: SQL Plan Baselines. We captured good execution plans and locked them in using DBMS_SPM. New plans have to be verified first before the optimizer can use them. A somewhat conservative approach, but in a banking system you don’t want surprises.

The second step was setting up pending statistics. Statistics are first published to a pending state, tested on a staging environment, and only then activated in production. Yes, it’s more work. Yes, it’s worth it.

Step 2: Partitioning — Foundation for Large Tables

The main transaction table had 800 million rows and was growing. Without partitioning it was like looking for a needle in a haystack — even with an index. Partitioning is an Enterprise Edition feature (and therefore expensive), but for tables of this size it’s a necessity.

We chose range partitioning by month on the TRANSACTION_DATE column. Older partitions (>2 years) are compressed using COMPRESS FOR OLTP. The result: partition pruning reduced I/O for most queries by 90%, because a typical query asks for data from the last month.

Archiving is now trivial — ALTER TABLE DROP PARTITION instead of DELETE with millions of rows. Batch jobs that previously ran for 6 hours now take 40 minutes.

Step 3: Index Strategy

A classic mistake: the table had 23 indexes, 8 of which were unused. Every INSERT had to update all 23 indexes — and with millions of inserts per day, that cost performance.

We went through V$SQL_PLAN and identified the indexes actually being used. We dropped eight, replaced three unnecessarily wide composite indexes with more targeted ones, and added two function-based indexes for frequent queries using TRUNC(transaction_date).

Result: INSERT throughput increased by 35%. And paradoxically — SELECT queries were also faster, because the optimizer had fewer indexes to consider and stopped choosing the wrong ones.

Step 4: PGA and Temp Management

Batch jobs were generating massive sorts and hash joins that spilled to disk (temp tablespace). PGA_AGGREGATE_TARGET was set to 2 GB, which for a server with 128 GB RAM was laughable.

We raised PGA to 16 GB and moved temp tablespace to fast SSDs (replacing old SAS drives). Batch jobs that previously filled temp and crashed now run cleanly in memory.

Step 5: Connection Pooling and Application-Side Changes

On the application side (Java EE, GlassFish) we found that the connection pool was configured to a maximum of 200 connections, but an average of 15 were being used. The remaining connections were just consuming session memory on the Oracle side.

We reduced the pool to 50, added statement caching (PreparedStatement cache in the JDBC driver), and enabled implicit statement caching on the Oracle side. Hard parses dropped from 500/s to 20/s.

Results After Three Months

  • Average response time: from 2.3s to 0.4s
  • Buffer cache hit ratio: from 87% to 99.2%
  • Batch job completion: from 6h to 40min
  • Hard parses: from 500/s to 20/s
  • Temp tablespace usage: from 98% to 15%

No hardware upgrade. No new licenses. Just proper configuration and an understanding of what Oracle actually does under the hood.

Lessons Learned

Oracle Database is a powerful tool, but it requires care. Read AWR reports regularly — not just when there’s a fire. Invest time in partitioning for large tables. And above all: before buying new hardware, check whether you’re using what you already have correctly.

oracledatabaseperformanceenterprise
Share:

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us