Intermediate SQL Color Coded SQL, UNIX and Database Essays

7Mar/1066

How ORACLE Uses Memory on AIX. Part 1: Processes

In this post I am going to talk about how ORACLE allocates and uses memory when running on AIX, but I will also talk about the power of approximation and how it can sometimes be misused for ill purposes 😉

On the outset, ORACLE/AIX memory “deal” seems simple enough – obviously, ORACLE will use memory when it runs and many AIX commands (such as vmstat or ps) will show memory usage both system wide and specific to particular process. But, as always, the devil is in the details and the effect of those “details” may be far from subtle.

So, why don’t we go ahead and find that devil, shall we ?

The straightforward (but naive) approach

Let’s say that we have an ORACLE instance running on one of our AIX servers. How do we find out how much memory it is using ?

First of all, let’s consider what ORACLE memory consists of. Here we actually have 2 separate parts:

  1. Memory used by System Global Area or SGA, that is implemented in AIX as a shared memory segment.
  2. Total memory used by ORACLE processes that represent the instance.

Next, let’s determine how much memory is used by each part.

Fortunately, SGA size is readily available -> just run ipcs -bm command and it’s there (The size of shared segment should match sga_max_size in 10g or memory_max_target in 11g).

Calculating “process” memory is a bit more involved as we need to sum memory used by ALL instance processes, but it is not very difficult either. ps -l command will show individual process memory size and a simple awk script can sum all of them up, such as:

AIX> ps -elf |
egrep " oracle${ORACLE_SID} | ora_.*_${ORACLE_SID} " |
grep -v egrep | awk '{sum += $10} END {print sum/1024}'

The sum of SGA size and process size will obviously represent total memory usage by ORACLE instance and you can see the entire calculation in the slide below.

Well, let me tell you right away where the punch line is – this calculation is WRONG – in a typical case, it overestimates ORACLE memory usage by, at least, a factor of 5-10 or more.

But where exactly have we made a mistake ?

Have we identified ORACLE memory parts incorrectly ? No, ORACLE memory does indeed take 2 parts: SGA and combined per-process memory from instance processes.

Have we made a mistake in the summing formula somewhere ? Well, not really, the formula is trivial – not much room for errors here …

The reason our answer is wrong is more subtle and is related to the fact that modern operating systems (AIX included) employ a number of smart tricks to allocate and manage system memory.

For whatever reason, most AIX commands do not take these tricks into account and display memory “sizes” as if nothing is going on (read: the way it was done in the 70s).

Nevertheless, the tricks are there and their effects are real, so let’s see how we can “out trick the trickster” and find out the real memory allocation.

We will start with process memory.

Estimating Process Memory Use in AIX

Trick #1 – Some memory may be shared

To simplify a little, memory that is used by a typical AIX process usually is divided into 2 major parts:

  1. User Data – Variables, dynamically allocated data, function parameters and return values etc
  2. Program Code (program itself, shared libraries etc This is also known as: Text)

While user data is obviously unique to each process and will change in size slightly (or not so slightly) as the process runs, the code part is different – it is static and, moreover, it is exactly the same for all programs that run it.

Since all ORACLE processes that make ORACLE instance are “instantiated” from the same binary disk image – $ORACLE_HOME/bin/oracle (you did know that, didn’t you? 😉 ), there is no reason for operating system to duplicate ORACLE code segment – instead AIX loads it in memory once and then links to each process.

Let’s prove this:

AIX CODE Segments vs DATA Segments

The interesting observation here is that under normal circumstances and especially for idle (ORACLE) processes, code segment will be much larger than data (yes, ORACLE has a big code!), reaching up to 90-95% of the size reported by PS. That means that for ORACLE processes:

ps -l usually significantly over reports memory size, and the real size is MUCH less

Alternatively, I guess, you could say that ps does report size properly but only if this was the only process in the system – you pick your poison 😉 …

A simple shortcut to see the real memory size of the process (excluding memory that is shared) is to use ps v command:

AIX> ps v 880802
PID    TTY STAT  TIME PGIN  SIZE   RSS   LIM  TSIZ   TRS %CPU %MEM COMMAND
880802   - A    86:59 89065  7088 58020    xx 88839 52048  2.0  0.0 ora_s00

AIX> ps -elf | head -1;ps -elf | grep 880802 | grep -v grep
F       S      UID     PID    PPID   C PRI NI ADDR   SZ    WCHAN    STIME  TTY  TIME CMD
240001 A   oracle  880802    1   8  64 20 4d345400 95924   Mar 01    - 86:59 ora_s000_qaten

Here, the SIZE column shows the virtual size of process DATA segment (we will talk about what virtual means shortly) and as you can see it is much smaller (7088) than SZ (95924) that is reported by ps -l.

A more precise way is to use svmon command, that displays all process memory segments and can neatly group them into SHARED and EXCLUSIVE categories (well, there is also SYSTEM, but that is another story).

AIX> svmon -P 880802 -O segment=category -O filterprop=data

...............................................................................
EXCLUSIVE segments                   Inuse      Pin     Pgsp  Virtual
1486       22      306     1765

Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
144370        11 work text data BSS heap           s   1108     0  135  1211
7d4e3  ffffffff work application stack            s    132     0   18   150
...

...............................................................................
SHARED segments                      Inuse      Pin     Pgsp  Virtual
1528419        0   418040  1565766

Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
1e483c        10 clnt text data BSS heap,          s  13012     0    -     -
/dev/fslv04:911099
30a0  90000000 work shared library text          s   4914     0   76  9685
...

Ok, we can see now that ps -l process size is bloated because it does not take sharable segments into account. But is this the whole story ?

Not quite, there is still one other notable trick in AIX bag …

Trick #2: Some memory may be swapped away

Let me ask you this – would it be possible for the process to allocate 2 Gb of RAM on the system that only has 1 Gb of physical memory ?

The answer is: of course, and it happens every day on many systems (albeit, the ratio in this example is somewhat extreme). That is: most modern operating systems (AIX is no exception) are designed to handle workloads that require more memory than the system has.

So, right of the bat, when we are talking about memory, we may actually mean 2 quite different things:

  1. Memory that is requested by system processes (we will call it Virtual)
  2. Physical memory that the system has (we will cal it Real or Physical)

It should be obvious that if Virtual > Real, something must happen to the portion of Virtual memory that does not fit into Real memory for the system to continue working properly. What usually happens is that the excess of Virtual memory (normally the oldest or least used pages) is saved to a special area on disk called SWAP (or Paging Space in AIX).

I guess you see where I’m going with this – how do we know whether the memory allocated by ORACLE processes is really in memory or has it been swapped to disk?

Here, svmon and ps v will tell us the details again.

Let’s look at ps v shortcut first:

ps v 1282210
PID    TTY STAT  TIME PGIN  SIZE   RSS   LIM  TSIZ   TRS %CPU %MEM COMMAND
1282210      - A     1:05 14598 19224 63308    xx 88839 52048  0.0  0.0 ora_arc

Remember that SIZE represents process Virtual size (or memory that is requested by the process). RSS represents In Memory size of CODE+DATA, while TRS represents In Memory size of CODE. So, In Memory size of DATA (what we really want to know) is RSS-TRS=11260 (Kb), which means that 19224-11260=7964 (Kb) is probably in paging space.

Once again, svmon will show a more precise information:

AIX> svmon -P 1282210 -O filtercat=exclusive -O filterprop=data

-------------------------------------------------------------------------------
Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
1282210 oracle            2815       22     1963     4806      Y     N     N

Notice that the Virtual size is 4806 (4K pages) and 4806*4=19224, so we have a match here. InUse (In Memory) size is 2815 (4K pages) and 2815*4=11260, so we have a match again.

Yet, look at Pgsp statistics. 1963*4=7852 which does not match our estimation from ps. Here we can see that the formula:

Virtual Memory Size

Virtual Size = Size(In Memory) + Size(In Paging Space)

is not precise all the time – my guess is: some of paging space may be reserved even though the page is really in memory, conversely, there might be some resident memory leftovers after pages have been moved to paging space. In either case, svmon will generally provide a more accurate estimation (and you can see why ps v is only a shortcut).

To summarize, to see the true memory usage by ORACLE processes in AIX, you should adjust your estimation keeping these 2 facts in mind:

  • Do not over count segments that are shared (that is – code)
  • Recognize that some process memory may be in the paging space.

If you do NOT want to do these calculations by hand, you can download omem_proc.sh and ora_mem.pl tools from this site.

Finally, is there any way to control how much memory ORACLE instance processes are using ?

How to control ORACLE process memory usage

There are several process memory controls that can be implemented.

The simplest way is to use AIX ulimits – you can set maximum memory allocation limits, separately for process data (User Data), stack and rss (virtual memory) components. You can set these (per user) settings in /etc/security/limits (or through smit) and you can view them with ulimit -a command:

AIX> ulimit -a
data(kbytes)         unlimited
stack(kbytes)        4194304
memory(kbytes)       unlimited
...

But limiting process memory usage in this way is like having a firing squad enforce parking rules – one small mistake and you are dead! Plus, ORACLE processes do not really know that they are NOT supposed to exceed AIX ulimits and some of them oftentimes (you know ORACLE …) might need to have a lot more additional memory.

So, we need another mechanism, one that is gentler and aware what ORACLE processes are doing, but at the same time, sane enough to not let ORACLE kill the system with unreasonable memory demands.

This mechanism is provided by ORACLE and there are actually two of them:

The older one – manual process memory management sets memory usage limits individually per process. MAX per-process memory allocation is controlled by parameter sort_area_size (with sort_area_retained_size acting as the required minimum). This is the only mechanism available up to ORACLE 8i and it will still be used in later versions if parameter workarea_size_policy is set to MANUAL.

The newer one – automatic process memory management sets memory usage limit collectively for ALL ORACLE server processes. It is controlled by pga_aggregate_target parameter and works when workarea_size_policy=AUTO.

A couple of things to remember about pga_aggregate_target:

  • It is an advisory upper target that ORACLE will try to enforce, but might not be able to under extraordinary circumstances (i.e. you if have 5000 concurrent active sessions that need to sort data but only allocated 200M of pga_aggregate_target – there is no way 200M target will be met)
  • It might not cover all the process memory. Rather what it covers is – various work areas: sort, hash, bitmap merge etc but if you decide to allocate another 1,000,000 item PL/SQL array in your session – ORACLE has no choice but to let you use memory for that (however, subsequent sessions will have to use less memory for sorting,hashing etc).

Ok, so I think we have a better idea now how to see the real memory usage by ORACLE processes and how to control that usage.

But what about the other (and arguably bigger) chunk of memory that ORACLE uses – SGA ? Stay tuned as we will talk about that in Part 2.

Useful Commands

# Regular AIX commands that should be available on any system ...

# ps v: Shortcut for real memory usage
AIX> ps v pid

# ps v statistics for all processes that belong to $ORACLE_SID instance
AIX> ps gvw | head -1; ps gvw |
egrep " oracle${ORACLE_SID} | ora_.*_${ORACLE_SID} " | grep -v egrep

# Detailed memory usage by process:
# All memory segments allocated to particular process
AIX> svmon -P pid

# Relevant memory segments allocated to particular process
# and categorized as SHARED/EXCLUSIVE
AIX> svmon -P pid -O filteprop=data -O segment=category

# The same information as before but summarized
AIX> svmon -P pid -O filteprop=data

# The same information as before but for all $ORACLE_SID instance processes

AIX> svmon -P $(ps -elf | egrep " oracle${ORACLE_SID} | ora_.*_${ORACLE_SID} " |
grep -v egrep | awk '{print $4}') -O segment=category -O filterprop=data

# ORACLE memory utilization scripts - you can find these files in the TOOLS section

# Total memory allocation for ORACLE instance(processes and SGA)
AIX> omem.sh $ORACLE_SID

# Memory allocation by ORACLE processes (from ps v)
AIX> omem_proc.sh $ORACLE_SID

# Memory allocation by ORACLE processes (from svmon)
# -c part is optional but will display useful session information
# (you should connect to user with SELECT ANY DICTIONARY privilege)

AIX> ora_mem.pl -i $ORACLE_SID -c db_connection

# Detailed memory report for ORACLE instance processes
# See where memory is spent: user data, stack, shared library vars etc ...
#  (you need to have a BIG screen here)
AIX> ora_mem.pl -i $ORACLE_SID -r detailed
Comments (66) Trackbacks (1)
  1. Hi Graham,

    The reason is: data is collected in several stages and system state can slightly change between each collection (i.e. a few processes might exit). In this case, “null” (not collected) data might be present for some entries.

    It is an inconsistency of my script, but it happens fairly infrequently and (as you mentioned) does not affect the tally by much, so I chose to ignore it. If it becomes troublesome, just add an additional check before printing to make sure data is defined.

    Regards,
    Maxym Kharchenko

  2. Hi Maxym
    Many thanks for the quick reply, and for all the information contained in your articles.
    Regards
    Graham

  3. Hi Maxym
    Thanks for your sharing, it is really helpful for me.
    But there is a misunderstanding in the script omem.sh (ora_mem1.2.tar.gz)
    in line 48:
    awk ‘{rss = ($7-$10); virt= $6; code=$10; c++ } END {print int((rss+code)/1024), int(virt/1024), c}’ | read p_r p_v p_c

    I think it should be:
    awk ‘{rss = ($7-$10); virt= $6; code=$10; c++ } END {print int((rss+code/c)/1024), int(virt/1024), c}’ | read p_r p_v p_c

    it is correct?

    Thanks a lot.

  4. Hi Maxym

    I follow your step in this article, and I notice when I run the command like
    svmon -P 880802 -O segment=category -O filterprop=data

    the clnt text data BSS heap, /dev/fslv04:911099 is showing in Exclusive section, not in Shared section.
    I thought that one should be Shared Code Segment so it should be Shared section.

    How come?

    Here is output I got.

    oracle@cdruhyp09d:/global/oracle/yxu/scripts:$ svmon -P 33423402 -O segment=category -O filterprop=data
    Unit: page

    ——————————————————————————-
    Pid Command Inuse Pin Pgsp Virtual
    33423402 oracle 437701 9712 0 364158

    …………………………………………………………………….
    SYSTEM segments Inuse Pin Pgsp Virtual
    12896 9664 0 12896

    Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
    20002 0 work kernel segment m 660 604 0 660
    50005 9ffffffd work shared library sm 2305 0 0 2305
    e000e 9ffffffe work shared library sm 23 0 0 23
    970095 8fffffff work private load data s 8 0 0 8

    …………………………………………………………………….
    EXCLUSIVE segments Inuse Pin Pgsp Virtual
    90341 48 0 16798

    Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
    af0faf 10 clnt text data BSS heap, s 73525 0 – –
    /dev/orabinlv:12549
    990019 90000000 work shared library text m 775 0 0 775
    9a00d8 80020014 work USLA heap sm 1635 0 0 1635
    9b001b 90020014 work shared library s 1449 0 0 1449
    b811b8 11 work text data BSS heap sm 1090 0 0 1090
    852005 9001000a work shared library data sm 108 0 0 108
    b41534 f00000002 work process private m 5 3 0 5
    bf053d ffffffff work application stack sm 33 0 0 33
    9200d2 9fffffff clnt USLA text,/dev/hd2:4167 s 18 0 – –
    a91469 8001000a work private load data sm 3 0 0 3
    9c011e fffffff4 work application stack sm 0 0 0 0
    bbfdba fffffff2 work application stack sm 0 0 0 0
    ac022e fffffff8 work application stack sm 0 0 0 0
    bc1f3c fffffff9 work application stack sm 0 0 0 0
    810383 fffffff0 work application stack sm 0 0 0 0
    bffd3e fffffff6 work application stack sm 0 0 0 0
    99015b fffffff7 work application stack sm 0 0 0 0
    92fc93 fffffffc work application stack sm 0 0 0 0
    940496 fffffff1 work application stack sm 0 0 0 0
    a703a5 fffffffb work application stack sm 0 0 0 0
    920190 fffffff3 work application stack sm 0 0 0 0
    962196 fffffffd work application stack sm 0 0 0 0
    9421d4 fffffffa work application stack sm 0 0 0 0
    921d92 fffffff5 work application stack sm 0 0 0 0
    8f20cf fffffffe work application stack sm 0 0 0 0

    …………………………………………………………………….
    SHARED segments Inuse Pin Pgsp Virtual
    334464 0 0 334464

    Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
    b81178 70000007 work default shmat/mmap m 4096 0 0 4096
    a615a6 70000009 work default shmat/mmap m 4040 0 0 4040
    b40036 7000000a work default shmat/mmap m 4021 0 0 4021
    ac182c 7000000b work default shmat/mmap m 3930 0 0 3930
    af216f 70000008 work default shmat/mmap m 3837 0 0 3837
    bc05be 70000006 work default shmat/mmap m 768 0 0 768
    8d06cf 70000000 work default shmat/mmap m 211 0 0 211
    bc1e7c 7000000c work default shmat/mmap m 1 0 0 1
    b521b5 70000004 work default shmat/mmap m 0 0 0 0
    861546 70000005 work default shmat/mmap m 0 0 0 0

    Thanks in advance.

    Kelly

  5. And I doesn’t see the match here, What did I miss? I am running Oracle 11gR2 on AIX 7.1.

    oracle@cdruhyp09d:/global/oracle/yxu/scripts:$ ps v 33423402
    PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND
    33423402 – A 0:00 0 11828 305928 xx 181714 294100 0.0 4.0 oracle
    oracle@cdruhyp09d:/global/oracle/yxu/scripts:$ svmon -P 33423402 -O filtercat=exclusive -O filterprop=data
    Unit: page

    ——————————————————————————-
    Pid Command Inuse Pin Pgsp Virtual
    33423402 oracle 90341 48 0 16798
    oracle@cdruhyp09d:/global/oracle/yxu/scripts:$

    Thanks

    Kelly

  6. Hello Kelly,

    Thank you for your question. Unfortunately I do not think I am qualified to answer it anymore. Over the last 3 years I am no longer in AIX environment and while I still can talk about the basics, looking at details is no longer in my reach.

    Sorry …
    Maxym Kharchenko

  7. oh, No problem at all.

    Still thanks a lot for your help to open a door to look deeper!

  8. Hi Maxym,

    From where can I get those scripts ora_mem.pl and omem_proc.sh

    Karthick

  9. Karthick,

    You can find them in the “tools” section: http://intermediatesql.com/tools/

  10. Hello Maxym,

    I have AIX 6.1 box running Oracle RAC 10.2.0.3 and No issues during weekdays but one of the nodes gets rebooted during weekend. My observation is 100% memory utilization during weekends and 65% of memory is being used for File System Cache which is less than 20% during weekdays. The File System cache comes to below 20% if the node gets rebooted otherwise the File System Cache showing 65% till the box is rebooted. Susptecting that the node is not responding to cluster monitor due to 100% of memory utilization and it gets evicted from the cluster and initiates reboot.

    Appreciate you suggestions to control the File System Cache in AIX 6.1

    Thank you
    Srini

  11. Hello Srini,

    I have not seen AIX systems for the last 3 years, so I can only give you a couple of general recommendations.

    Using filesystem cache with ORACLE is usually a bad idea as ORACLE maintains its own data cache and AIX cache becomes an unnecessary complication. Assuming, you do not run anything else on that system and are using regular jfs2 filesystems, check ORACLE installation guide for AIX and look for mount options that disable file cache (we used “cio” mount option, but it was more than 3 years ago).

    Out of the box, filesystem cache is usually “secondary” – its pages will be dropped and reused by “computation memory” (SGA, process memory etc) if there is memory pressure in the system. So, it seems rather unlikely that file cache growth is the reason for reboots. Of course, you might want to check for memory parameters that control file cache thresholds, such as min allocation, page replacement policy etc to make sure you do not have any unreasonable settings there.

    Cheers,
    Maxym Kharchenko

  12. AIX 7.1
    Here are some fixes to help these scripts work with GRID+Oracle:

    1.) Remove the -w in the first line of the perl(.pl) scripts.

    2.) In omem.sh on line 24 remove the space between } & “

    Change:

    SMON_PID=$(ps -fu oracle | fgrep “ora_smon_${ORACLE_SID} ” | grep -v grep | awk ‘{print $2}’)

    To:

    SMON_PID=$(ps -fu oracle | fgrep “ora_smon_${ORACLE_SID}” | grep -v grep | awk ‘{print $2}’)

    3.) In the same line you may need to change “ps -fu oracle” to “ps -fu grid” if everything is run as GRID.
    This is usually on a single note GRID+ASM setup.

    4.) You can also copy omem.sh to gmem.sh and change the same line in steps 2 & 3 in this file like so:

    SMON_PID=$(ps -fu grid | fgrep “asm_smon_${ORACLE_SID}” | grep -v grep | awk ‘{print $2}’)

    This will give you GRID+ASM memory usage.

    Disclaimer: I have not done a complete verification that the changes give correct data. I am still testing. ;o)

    Thank You for the awesome scripts!

  13. Hello Kelev,

    Thanks for doing this. Unfortunately these days I don’t have access to Oracle EM (or AIX for that matter), but I’m sure many people will find it useful.

    Cheers,
    Maxym Kharchenko

  14. Kevev of course. Apologize for misspelling it in the previous comment.

  15. Hi Maxym,

    I understand you are not on Oracle or AIX now. But any idea how we can find out if paging is SGA paging or PGA paging?

    Regards
    Venki

  16. Hello Venky,

    PGA is just “process memory” or process “exclusive” segments in AIX. SGA is “shared segments”.

    svmon tool will answer your question – it can break down memory usage by segments and show which segments are “paging”. Check out my memory articles – there are a few examples of svmon usage there, in particular, to identify what segments are paging.

    Cheers,
    Maxym Kharchenko


Leave a comment