Intermediate SQL Color Coded SQL, UNIX and Database Essays

18Mar/1033

How ORACLE Uses Memory on AIX. Part 3: Locking SGA

In the previous post we discussed how ORACLE allocates shared memory for SGA in AIX and one of the conclusions was that AIX does not give all the requested memory to ORACLE instance right away but merely promises it.

While this technique allows to use memory more efficiently and you (at least temporarily), can request more memory for processes and shared segments than what AIX physically has, it also has a rather unpleasant consequence – when we get to the limit of physical memory, AIX will have no choice but to start paging memory.

Paging is not necessarily a bad thing – moving older and not-so-often used data out of memory is something that will be done rather routinely – this is how AIX keeps a healthy system. However, when SGA memory starts to page out (and, more importantly, page back in) things can go bad quickly as, well, ORACLE does not really expect SGA to be a disk based area … (ORACLE would have called it “SDA” if that was the case 😉 )

You probably know that in the vast majority of configurations, it is strongly advised to size SGA so that it fits entirely into physical memory and never pages out. The question becomes: how can we accomplish that on AIX?

Pinning ORACLE SGA into AIX Memory

It turns out that there are several ways to pin ORACLE SGA into AIX memory, some of them ORACLE-driven, some AIX-driven and a combination of both …

First of all, let’s look at what ORACLE offers.

We will start by checking ORACLE sga-related parameters:

SQL> SHOW parameter sga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga          BOOLEAN     FALSE
pre_page_sga      BOOLEAN     FALSE
sga_max_size      big INTEGER 8G
sga_target        big INTEGER 8G

The first two parameters (lock_sga and pre_page_sga) look promising and, in fact, they can be used to control how SGA memory is allocated.

Let’s look at pre_page_sga first.

Controlling memory allocation with pre_page_sga

According to ORACLE documentation, when pre_page_sga is set to true “every process that starts must access every page in SGA“. Obviously, when this happens, the entire SGA memory is used and thus allocated.

Let’s see for ourselves. Before we even begin, let’s remind ourselves how memory is allocated when this parameter is NOT set.

AIX Default Sga Settings

As you can see, in the beginning, most of the memory is under allocated (AIX promised it but did not yet deliver) as not all of the memory has been used.

After setting pre_page_sga to TRUE and restarting the database the picture changes:

AIX Pre_Page_Sga True

Notice that all segments are allocated to the MAX – that is the result of instance processes reading and touching all the memory pages during startup. This obviously has a direct effect on the time it takes to start up the database – in my environment it took ~ 40 seconds (compared to ~ 12 seconds with default settings) for a 4Gb SGA. Presumably, however, this additional time has not been wasted – all further requests to SGA memory are supposed to hit real physical memory and AIX will not need to do any additional allocations.

Still, there are two problems with this approach:

  1. Notice that the memory, although fully allocated, is NOT really pinned. That means that if AIX starts experiencing memory shortages, you can bet that it will start paging SGA memory out with all the unpleasant consequences.
  2. A somewhat unexpected consequence is that it now takes more time for any ORACLE process to start as the “touching” it is not done just during instance startup – it is happening for any new ORACLE process (i.e. dedicated server). In my environment, average database connection time went from ~ 0.2 second to ~ 0.8 second, a 4 time increase.

Given these downsides, it is really hard to find a good justification for using pre_page_sga to “load ORACLE memory in memory”. I’m guessing this parameter is probably a relict of the past, or, perhaps, a way to pre-load memory for systems that do not support real memory pinning (remember that ORACLE can run on many operating systems). But in modern AIX, I just do not see how it can be effectively used.

So, let’s move on to the next parameter – lock_sga

Controlling memory allocation with lock_sga

When lock_sga is set to true, ORACLE (based on what truss output shows), runs this additional command on a (global – think ipcs) shared memory segment:

shmctl(..., SHM_LOCK, ...)

which pins shared memory region into physical memory.

Let’s see how it works. After setting lock_sga=true, and restarting the database, here is what I see:

AIX Lock_Sga True

Notice, that memory is not only allocated fully, but is also pinned and this is really what we want to achieve. The database startup still takes more time than without this parameter (on my system, ~ 34 seconds compared to ~ 12 seconds, again, for a 4Gb SGA), but normal database connections do not suffer any longer as, beyond startup, ORACLE processes do not need to do any (major) extra stuff.

One note here: Many ORACLE documentation sources recommend to also set v_pinshm AIX vmo parameter to enable memory pinning as in:

vmo -p -o v_pinshm = 1

However this is no longer required, unless you are dealing with a really old version of ORACLE.

With versions up to 9i, ORACLE used a different call for memory pinning:

shmget(IPC_PRIVATE, shm_size, IPC_CREAT|SHM_PIN)

which required that v_pinshm is also set. As I mentioned, in 10g and beyond ORACLE uses:

shmctl(shm_id, SHM_LOCK, ...)

that completely ignores v_pinshm settings (special thanks to Leszek Leszczynski for researching this in detail). In my tests, ORACLE 10g/11g memory was pinned regardless of the value of v_pinshm. You can of course, still set it if you need it for ORACLE 9i or for other applications.

In any case, looks like setting lock_sga=true (and, v_pinshm=1, if needed) solves the problem of SGA pinning to our satisfaction – memory is pinned and everybody is happy.

But I would submit that for larger SGAs (and what SGA these days is NOT large? 🙂 ) there is an even better way to work with AIX memory and that is – using AIX large memory pages.

Using AIX large memory pages

Before discussing how to use large memory pages in AIX, let’s step back a little and discuss what exactly these pages are and what kind of pages we can allocate.

As I mentioned already, memory in AIX is controlled by a Virtual Memory Manager that divides (virtual) memory into chunks (or pages) of equal size. Traditionally, these chunks have always had a mandatory size of 4K, but recently, with the advent of machines that can handle large amounts of RAM, this started to change.

AIX on a modern hardware now supports 4 different page sizes: 4K, 64K, 16M and 16G and I outlined the difference between them in the table below:



Page size Svmon symbol Configuration Pageable How to configure How to use
4K s Traditional, Automatic YES N/A By default
64K m Automatic YES N/A By default
16M L Manual NO vmo -p -o lgpg_regions=2048 lgpg_size=16777216 chuser capabilities= CAP_BYPASS_RAC_VMM,
CAP_PROPAGATE
oracle
lock_sga=TRUE
16G L Manual NO vmo -p -o lgpg_regions=10 lgpg_size=17179869184 chuser capabilities= CAP_BYPASS_RAC_VMM,
CAP_PROPAGATE
oracle
lock_sga=TRUE

As you can see, 4K pages are still there and still default, but AIX has now also added new “default” 64K pages. Default in this context means that you do not need to do anything special to either enable these pages or use them – AIX will decide when to use 64K pages instead of 4K and this will be done completely transparently to programs (including ORACLE, of course). In fact, in modern hardware, you would most likely see 64K pages used by ORACLE SGA as (a large) SGA size will definitely warrant them.

There is also an interesting development with AIX 6.1. While AIX 5.3 can allocate 64K pages from the start, AIX 6.1 can take existing 4K memory regions and see if they can be “collapsed” from 4K to 64K pages. svmon will show “collapsed” regions as sm.

But back to memory pages. Beyond medium (64K) pages, AIX also allows to use even larger pages – 16M or 16G. However, there are two important differences here:

  1. Large pages are NOT available by default. They require extra steps to enable them and (separately) to use them
  2. Large pages are NOT pageable. Once allocated, they always stay in memory and cannot be paged in or out (which is probably a good thing, but you do need to pay special attention to how you size them).

In addition to that, not all AIX hardware will support larger pages. To see if your particular hardware supports them run:

AIX> pagesize -a

4096
65536
16777216
17179869184

The one problem with large pages is that they are somewhat cumbersome to use.

First of all, you have to pre calculate the “large page memory” size and explicitly set it with the VMM (this will take memory away from regular VMM operations and designate it to “large page” region).

AIX> vmo -p -o lgpg_regions=2048 lgpg_size=16777216
AIX> bosboot -ad /dev/hdisk0; bosboot -ad /dev/hdisk1; bosboot –ad /dev/ipldevice

Personally, I do not see it as a major issue as you have to assign a specific size for your SGA anyway, albeit now you will have to do it on 2 levels: ORACLE and AIX.

Second, even then allocated, large pages cannot be used unless you allow user to skip regular VMM allocation policies with this command:

AIX> chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE $USER

which, again, in my mind is only a minor nuisance.

(Of course, there are also bugs … in particular, ORACLE 10.2.0.4 will not use AIX large pages even if all settings are made, unless one-off patch: 7226548 is applied … But I digress …)

Anyway, now we know what large pages are, but why exactly do we want to use them?

Well, for larger SGA sizes the benefits should be fairly obviously: making page sizes larger reduces the number of pages that AIX has to manage and that makes managing memory more efficient. Think about this: for a 30 Gb SGA (which is not excessively big these days …), the number of pages is reduced from 7,864,320 (for 4K pages) or 491,520 (for 64K pages) to 1,920 if we switch to 16M pages and that is, indeed, quite a savings …

I.e. look at how this reduction affects database startup time (test results from one of my systems):

  • the 30 Gb SGA database started in ~ 6 seconds with default settings (but remember that memory is not really fully allocated)
  • lock_sga=TRUE with 64K pages changed that startup time to ~ 35 seconds
  • lock_sga=TRUE + large (16M) pages drove the startup time back to ~ 6 seconds

On top of that, once you set up large pages, you effectively shielded this memory from the rest of the system – it will not be paged out or affected by regular memory operations, which is, ultimately, what you want to achieve in most cases.

Finally, once allocated, how will “large page” memory be reported by svmon? Well, see for yourself:

AIX Large Pages

This would normally conclude AIX memory story, if not for one thing – ORACLE 11g made a major changes in this area, making SGA, in addition to PGA much more dynamic…

In the next post we are going to have some fun with AIX and ORACLE 11g memory_targets.

Comments (33) Trackbacks (0)
  1. hi , how i can know the page size currently on mmy system aix ?
    thnks

  2. By default, AIX page size is: 4k. It can grow to 64k if AIX deems necessary. Large pages (16M or 16G) need to be enabled explicitly (that is: manually).

    To find out what page sizes your system supports, run pagesize -a
    I.e.:
    pagesize -a
    4096
    65536
    16777216
    17179869184

    … and this particular system supports 4k, 64k, 16M and 16G pages

    To find out what page sizes your program uses, run i.e. svmon -P

    I.e. here I’m looking at my shell command:

    Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
    62783490 ksh 32626 8932 0 32550 N N N

    PageSize Inuse Pin Pgsp Virtual
    s 4 KB 226 4 0 150
    m 64 KB 2025 558 0 2025

    Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
    9000 d work shared library text m 1379 0 0 1379
    20002 0 work kernel segment m 646 558 0 646
    1294628 2 work process private s 120 4 0 120
    1340074 1 clnt code,/dev/hd2:3599 s 71 0 – –
    119835b f work shared library data s 30 0 0 30
    13a3c38 – clnt /dev/fslv00:1993 s 3 0 – –
    10e010e – clnt /dev/hd2:20163 s 2 0 – –

    In this picture: ‘m’ are 64k pages and ‘s’ are 4k pages … This program uses both.

  3. Maxym,

    great post.

    I have some questions: I’ve been told by SA’s and Oracle DBA’s that they managed to have AIX 5.3 use 16 MB Large Pages without setting LOCK_SGA=TRUE.

    So all required steps from OS side have been done (all this vmo, chuser stuff) but they don’t set LOCK_SGA=TRUE.

    They even claim that 11g AMM (memory_target=xxx) under AIX 5.3 uses the Large Pages configured, however I found this post here:

    http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/oracle-is-not-using-large-pages-on-aix-61-2716017

    What does this errpt output “ONE OR MORE LARGE PAGES HAS BEEN CONVERTED INTO PAGEABLE PAGES” mean? Can AIX do some “conversion” if memory requested using non-Large Pages is low?

    So can it happen that they were under the “impression” to use LargeParges since they seem to have been used but actually were converted to some “pageable pages”?

    I also found this post interesting:

    http://www.pubbs.net/201003/oracle/8051-problems-with-large-pages-on-aix-after-upgrading-to-102043.html

    I have not found a definitive answer to the question regarding LOCK_SGA in the documents I have read so far, so can you confirm that Large Pages can only be used by Oracle if LOCK_SGA=TRUE?

    Randolf

  4. Hello Randolf,

    These are some interesting questions that you ask 🙂

    I’ll start with the first one: Can ORACLE use AIX large pages without lock_sga=TRUE ?

    While I cannot confirm with 100% certainty that this is impossible (never say never!), I believe it is extremely unlikely.

    The proof is rather in the (ORACLE) code:

    I ran a bunch of tests starting ORACLE 10g instance and looking (through truss) at shared memory control operations.

    Here are the functions that ORACLE executes:

    lock_sga=FALSE

    shmget(..., ..., IPC_CREATE | ...) = ... /* Allocate shared segment */
    shmctl(..., SHM_PAGESIZE, ...) = 0       /* Set (default) size */

    lock_sga=TRUE, but no (or not enough) large pages

    shmget(..., ..., IPC_CREATE | ...) =
    shmctl(..., SHM_PAGESIZE, ...) Err#12 ENOMEM /* Tried (large) pages and failed */
    shmctl(..., SHM_PAGESIZE, ...) = 0 /* Tried (regular) pages and succeeded */
    shmctl(..., SHM_LOCK, ...)     = 0 /* Lock SGA */

    lock_sga=TRUE, enough large pages

    shmget(..., ..., IPC_CREATE | ...) =
    shmctl(..., SHM_PAGESIZE, ...) = 0 /* Tried large pages and succeeded */
    shmctl(..., SHM_LOCK, ...)     = 0 /* Locked SGA */

    The thing here is: when lock_sga=FALSE, it doesn’t matter whether “other stuff” (large pages, user capabilities etc) is set or not – the additional shmctl() functions requesting large pages are simply NOT called !

    9i behaves in a slightly different way:

    lock_sga=FALSE

    shmget(..., ..., IPC_CREATE | ...) = ...

    lock_sga=TRUE

    shmget(..., ..., IPC_CREATE | ... | SHM_PIN | SHM_LGPAGE) = ...

    but the bottom line is still the same. Additional “large page” shmget flags are NOT “added” unless you set lock_sga=TRUE (incidentally, because of different SGA locking technique used by ORACLE 9i, shared segment behaves MUCH differently from 10g/11g, which is likely a root cause for many enduring memory myths … I’ll probably make another post just on that …)

    In any case, in the end, I very much doubt that SA/DBAs claim … To me, the only “plausible” way to do it is to make “large pages” default somehow, but I’m rather skeptical that it can be done. If you would like to read more, I found this IBM white paper pretty useful: http://www-03.ibm.com/systems/resources/systems_p_os_aix_whitepapers_multiple_page.pdf

    Anyway, without lock_sga=TRUE you cannot have SGA in large pages … Unless, of course, SA/DBAs that you mention did not refer to SGA at all … but rather to process private memory … For the latter, you actually CAN use large pages without lock_sga set … (and it is pretty easy to do: just set LDR_CNTRL environment variable to i.e.

    AIX> export LDR_CNTRL=DATAPSIZE=16M@TEXTPSIZE=16M@STACKPSIZE=4k

    before starting the instance and process data/text segments should be allocated in large pages …

    As for your second question: Can large pages be converted back to “pageable” ? – I have never seen this happening (I’m guessing you have to
    be very close to maxpin% to invoke that behavior), but I suppose it is possible (and errpt message looks legit :-)). On the other hand Can this “conversion” make an “illusion” of using large pages?” – I would guess NOT – this conversion seems to be a one way street: from large to small and you still have to set lock_sga=TRUE to get to large pages in the first place.

    Of course, I can turn this question around and say that it is very possible to configure everything for large pages and NOT use them (I guess that might qualify as “being under impression” 🙂 ) In fact, we just had that case recently when we upgraded from 10.2.0.4 to 10.2.0.5, which removed a critical “AIX large page” oneoff patch. So, when we restarted the instance after upgrade, we were “blissfully unaware” 🙂 that our instance allocated a small page SGA … Well, for a few hours, anyway …

    And finally, 11g memory_target and large pages – this is the hardest to believe, honestly … They seem to be designed for completely opposite purposes: LARGE_PAGES are ENFORCING that SGA is never released, while MEMORY_TARGET is making SGA play nice and decrease in size when necessary …

    Cheers,
    Maxym Kharchenko

  5. Maxym,

    great reply, thanks a lot for your time and the detailed explanation.

    I forgot to mention that this is about Oracle 11.1.0.7 – so any thoughts if above behaviour regarding the shmctl() calls with LargePages request could be different there (I doubt it myself but may be you’ve done the same analysis with 11.1.0.7 by any chance)?

    Also thanks for the reminder that it might be that they see LargePages being consumed but not by the SGA but PGA – as I understand this would be possible with a corresponding environment variable setting of LDR_CNTRL.

    What I meant by the “illusion” was something like this: Let’s pretend the system was under memory pressure and performed such a conversion – would it be possible to have the corresponding tools (like “svmon”) show that “LargePages” have been used but actually they have been used to be converted to “pageable pages”?

    Cheers,
    Randolf

  6. Randolf,

    No problem, I’m enjoying it 🙂

    Incidentally, I have shm* function traces for 11.1.0.7 database – they are essentially the same as in 10g with one minor difference – in 11.1.0.7 ORACLE adds one additional (permission) flag: S_IWGRP when shmget() is called … but this should hardly matter here. As far as shm* behavior is concerned 10.2 and 11.1 are exactly the same …

    “Conversion” question is much harder, of course … I was able to reproduce the error, but it was under very unusual circumstances and only for “unused” memory:

    When you allocate large pages, AIX gives them to you gradually and you can monitor this process if you run say vmstat -P 16m 1. When you get close to maxpin%, the large page allocation command usually fails before allocating “the complete” request with message similar to:

    vmo: 1485-124 Error setting lgpg_size lgpg_regions
    The system was unable to completely satisfy the request

    In other words, you might request say 128 pages (with maxpin% limit being at 192) and vmo command will fail while allocating, say, only 95. But the curious thing is that when you try to repeat the same command it will sometimes succeed and give you a few more pages.

    I ran page allocation command (vmo -o lgpg_regions=128 -o lgpg_size=…) in a loop and was able to gradually move from 95 to 108 pages. Then another interesting thing happened: looks like 108 was an equilibrium point for my system as I was able to move to 109, but then the system would spontaneously jump back to 108, releasing one large page:

    pgsz memory page
    —– ————————– ————————————
    siz avm fre re pi po fr sr cy

    16M 109 108 1 0 0 0 0 0 0
    16M 109 108 1 0 0 0 0 0 0
    16M 109 108 1 0 0 0 0 0 0
    16M 109 108 1 0 0 0 0 0 0
    16M 108 108 0 0 0 0 0 0 0
    16M 108 108 0 0 0 0 0 0 0
    16M 108 108 0 0 0 0 0 0 0
    16M 109 108 1 0 0 0 0 0 0
    16M 109 108 1 0 0 0 0 0 0

    Every such “jump back” would display the errpt “conversion” message that you mentioned … The “jump backs” were actually pretty consistent in my case, happening ~ every minute or so, but always involving only 1 (last?) page.

    However, notice that at the time of “jump” all the pages were although allocated, but still “free”. Once I actually “used” the 109 large pages by the test program – the jumps stopped (and I collected stats for something like an hour).

    I tried also to reduce maxpin% and lgpg_regions on allocated and used large page shared segment (also in the tight loop), but to no avail – looks like once the large pages are used, AIX does not give them away … and so, I’m rather doubtful that (allocated and “used”) large page SGA can “loose” pages to conversion …

    But of course, this might still be possible …

    Just my 2c …

    Regards,
    Maxym Kharchenko

  7. Maxym,

    thanks again for your detailed reply. My question regarding the “Large Pages” conversion was more about this scenario:

    – LargePages configured via VMO etc.
    – But Oracle is not using LOCK_SGA=TRUE which very likely means that Oracle does not attempt to use the pre-configured LargePages (at least for the SGA, but see below ORACLE_SGA_PGSZ variable)
    – Now Oracle uses “conventional” memory but LargePages memory has been configured and put aside and therefore the system eventually is under memory pressure
    – Can now this conversion happen to satisfy the “conventional” memory demand from Oracle?

    Regarding the particular case: I’m now one step further: I’ve got confirmation that they set the environment variable ORACLE_SGA_PGSZ=16m (which is described as workaround for bug 7226548 – Large Pages not being used in 10.2.0.4) before starting the Oracle instance.

    So is it possible that by using that “workaround” it is possible to use LargePages for the SGA without using LOCK_SGA=TRUE?

    I still have to ask why they don’t simply set LOCK_SGA=TRUE, but I’m curious if this workaround allows to use LargePages for the SGA without setting LOCK_SGA=TRUE – and in particular what happens when using AMM with this setting??

    By the way: They don’t use LDR_CNTRL for the text and data segments.

    Randolf

  8. Randolf,

    I just tested it and you’ve been right on both counts.

    1. With ORACLE_SGA_PGSZ=16m ORACLE DOES use large pages without lock_sga=TRUE. Looks like this variable triggers the same code path in the database as both

    shmctl(…, SHM_PAGESIZE, …) /* 16 Mb page */
    shmctl(…, SHM_LOCK, …)

    are called and, of course, SGA is using large pages after that … (and I agree with you, if it has the same effect as lock_sga, why not simply use lock_sga?)

    2. Allocated (but not used) large pages DO get released when the system is under memory pressure. Allocated and USED large pages do NOT.

    3. But the most interesting thing happened with ORACLE_SGA_PGSZ=16m and memory_target set. The database starts! and (large page!) SGA is allocated with the full memory_target size. I’m not sure that’s what ORACLE intended to happen (based on the fact that they do not allow lock_sga and memory_target to be both set) but it is curious situation indeed …

    I’ll research it in some more detail once I get back from a short vacation …

    I guess, you learn something new everyday … and my apologies for misleading you a bit initially.

    Regards,
    Maxym Kharchenko

  9. Maxym,

    thank you very much for the prompt reply and taking the time to test this odd scenario…

    I suspect that the reason for not using LOCK_SGA is that they use this “trick” also when running with AMM – and since AMM doesn’t work with LOCK_SGA the question is now: Does using the environment variable really mean the same (in terms of code paths) as using LOCK_SGA… I’m not really convinced.

    So this “workaround” in form of the environment variable seems to allow using LargePages in a way it wasn’t intended by Oracle – it would be interesting to know if this could have some side effects, in particular when using it together with AMM (after all now the PGA part of the memory_target seems to be allocated from LargePages as well…).

    The client has a new 750 model with P7 cores running under AIX 6.1 and we get some odd latch contention in the database that doesn’t show up when running the same database and load with P6 cores under AIX 5.3.

    Theoretically the P7 cores should have a lot more CPU power than the P6 cores – still it looks like we have a CPU starvation issue but no-one can explain it so far. May be this configuration using the environment variable (but no AMM, simply SGA_TARGET + PGA_AGGREGATE_TARGET) has some side effects under AIX 6.1, but of course I can’t prove it yet – however in the beginning the new system with P7 under AIX 6.1 was run with AMM (and this environment variable) and they got weird ORA-4030/4031 errors which let them turn off AMM. I don’t have hands on the exact error messages but it made me curious why they got these errors when the same (odd) configuration (same amount of memory, only difference is AIX 6.1 and P7 cores) was running without any memory errors under AIX 5.3 with P6 cores.

    Have a nice vacation – and if you find some time afterwards for more research… I’m all ears 🙂

    By the way: You confirmed above “2. Allocated (but not used) large pages DO get released when the system is under memory pressure”. How does this look like in tools like SVMON? Does it show LargeParges as being in use (although being converted / released) or is it clear from the output that the LargePages have been released / converted?

    Cheers,
    Randolf

  10. Randolf,

    Thanks, the vacation was nice (but very short) …

    Anyway, I’ve run some limited testing for memory_target with and without ORACLE_SGA_PGSZ set and what seems to happen is:

    Without ORACLE_SGA_PGSZ set:

    ORACLE PGA and (more importantly) SGA can be sized up and down, depending on workload (or forced parameters) and their sum roughly totals memory_target.

    I.e. this is what SGA memory (memory_target=2000M, sga_target=1500M, pga_aggregate_target=200M) looks like some ‘filler’ workload ran in it:

    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
       11c3519  m       0.00       0.00       0.00       0.00
       117d813  m      11.25       0.00       0.00      11.25
       11e60dd  m      97.69       0.00       0.00      97.69
       1124a13  m     118.50       0.00       0.00     118.50
       112c990  m     256.00       0.00       0.00     256.00
       10b428e  m     256.00       0.00       0.00     256.00
       13b0d3e  m     256.00       0.00       0.00     256.00
       11fdedd  m     256.00       0.00       0.00     256.00
    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
        TOTAL:       1251.44       0.00       0.00    1251.44

    Notice that ORACLE allocates all 8 shared segments for SGA (up to memory_target), so that SGA can potentially reach the state where SGA:100% and PGA: 0, but, of course, this is not likely to happen.

    Now, if we force SGA to resize:

    ALTER system SET sga_target=500M;
    ALTER system SET pga_aggregate_target=1200M;

    then SGA memory will look like this:

    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
       11c3519  m       0.00       0.00       0.00       0.00
       13b0d3e  m       0.00       0.00       0.00       0.00
       112c990  m       0.00       0.00       0.00       0.00
       11e60dd  m       1.69       0.00       0.00       1.69
       117d813  m      11.25       0.00       0.00      11.25
       1124a13  m     119.12       0.00       0.00     119.12
       11fdedd  m     160.00       0.00       0.00     160.00
       10b428e  m     256.00       0.00       0.00     256.00
    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
        TOTAL:        548.06       0.00       0.00     548.06

    Notice that ORACLE deallocated some of the SGA memory to give space to PGA and this is expected behavior.

    Now let’s test what happens when we use memory_target with ORACLE_SGA_PGSZ=16M

    When we start the database, its SGA memory looks like this:

    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
       10ffece  L     224.00     224.00       0.00     224.00
       1053885  L     256.00     256.00       0.00     256.00
       1355874  L     256.00     256.00       0.00     256.00
       119bd9b  L     256.00     256.00       0.00     256.00
       1077c43  L     256.00     256.00       0.00     256.00
       121f3e5  L     256.00     256.00       0.00     256.00
       117d313  L     256.00     256.00       0.00     256.00
       1300472  L     256.00     256.00       0.00     256.00
    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
        TOTAL:       2016.00    2016.00       0.00    2016.00

    Yes, it is using large pages but it is also allocated to the MAX (that is completely taking what is set in memory_target). Can this memory be released by resetting SGA and PGA as in previous example?

    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
       10ffece  L     224.00     224.00       0.00     224.00
       1053885  L     256.00     256.00       0.00     256.00
       1355874  L     256.00     256.00       0.00     256.00
       119bd9b  L     256.00     256.00       0.00     256.00
       1077c43  L     256.00     256.00       0.00     256.00
       121f3e5  L     256.00     256.00       0.00     256.00
       117d313  L     256.00     256.00       0.00     256.00
       1300472  L     256.00     256.00       0.00     256.00
    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
        TOTAL:       2016.00    2016.00       0.00    2016.00

    Well, not really – SGA allocations stands firm and does not give a shred of memory back (even though SGA size is reported as ~500M in v$memory_dynamic_components) … Notice that PGA memory is NOT allocated from this (memory_target) pool and also, unless LDR_CNTRL is used, is still allocated from “small pages”:

    # I’m running a number of SQL that perform large sorts with the sole purpose of ‘filling up PGA here’:

    PGA:
    ---------- ---------- ---------- ---------- ------------------------
       Vsid      InMem      Paging     Virtual          COMMAND
    ---------- ---------- ---------- ---------- ------------------------

      16253366     114.27       0.00     114.27 oracletest11
       8192058     114.29       0.00     114.29 oracletest11
      16122250     114.30       0.00     114.30 oracletest11
      38011016     114.38       0.00     114.38 oracletest11

      48496846     114.69       0.00     114.69 oracletest11
      59703300     114.72       0.00     114.72 oracletest11
            -1     200.16       0.00       0.00 TEXT SEGMENT
    ---------- ---------- ---------- ---------- ------------------------
       Vsid      InMem      Paging     Virtual          COMMAND
    ---------- ---------- ---------- ---------- ------------------------
        TOTAL:    1443.05       0.00    1242.89 Processes: 29

    SGA:

    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
       10ffece  L     224.00     224.00       0.00     224.00
       1053885  L     256.00     256.00       0.00     256.00
       1355874  L     256.00     256.00       0.00     256.00

    ---------- -- ---------- ---------- ---------- ----------
       Vsid    Pg   InMem       Pin       Paging     Virtual
    ---------- -- ---------- ---------- ---------- ----------
        TOTAL:       2016.00    2016.00       0.00    2016.00

    # Looking at one of the “dedicated server” processes:
    AIX>svmon -P 52756618
    -------------------------------------------------------------------------------
         Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
    52756618 oracle          635139   525328      208   583967      Y     N     Y

         PageSize                Inuse        Pin       Pgsp    Virtual
         s    4 KB               88579        256          0      37295
         m   64 KB                1904        561         13       1911
         L   16 MB                 126        126          0        126

        Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
     116e314  70000001 work default shmat/mmap           L     16    16    0      16 <- SGA (large pages!)
     1119515  70000000 work default shmat/mmap           L     16    16    0      16
    102cac6  70000002 work default shmat/mmap           L     16    16    0      16
     10142c2  70000007 work default shmat/mmap           L     14    14    0      14 <- SGA (large pages!)
     102a6c2        10 clnt text data BSS heap,          s  51240     0    -       -
                            /dev/fslv00:216105
     12d362c        11 work text data BSS heap           s  33039   256    0   33039 <- PGA (small pages!)

    139313b  9001000a work shared library data          s    107     0    0     107 <- PGA (small pages!)

    So, the bottom line is that with memory_target and ORACLE_SGA_PGSZ, ORACLE essentially goes back almost to 10g memory model where SGA and PGA are separate (memory_target here is essentially what sga_target was for 10g).

    I ran this configuration with some relatively heavy load and apart from the fact that ORACLE used MORE memory than what memory_target prescribed, I did not notice any other problems, certainly not ORA-04031 that you’ve seen … Must have been something else in your case …

    Now, for the question, how did I see that unallocated large pages were released?. Well, I did not see it in svmon, because, these pages were still “free” (that is: not attached to any process or shared segment) and it is kind of hard to look for “free” pages in svmon. Here is what I did instead:

    1. I allocated a large page region with vmo -o lgpg_regions=130 -o lgpg_size=16…, which at this time became full of “free” 16M pages. vmstat -P 16m 2 would show these pages as having been allocated, but not used.
    2. I then created “large page” shared segment (using this simple test program) which allocated 100 out of 130 pages (that is: 100 pages “used”, leaving 30 pages “free”). Once again, you could clearly see that in vmstat.
    3. I then created a memory pressure in the system by running the same test program to allocate and lock another (small page this time) shared memory region to the point that the system was constantly paging.
    4. At this time, I noticed (vmstat -P 16m 2) that the number of “large” pages in the system shrank from 130 to 100, eating up all the allocated but not used large pages. “LARGE PAGES CONVERTED …” errpt message was displayed at the same time.
    5. The amount of large pages in the system (as evidenced by vmstat) never went beyond 100 pages which were used, so I must conclude that unused large pages are relatively easy converted and given back to the system, while used large pages do NOT get converted back (or, at the very least, are very difficult to convert back – I ran this test until the system was thrashing and 100 used large pages were never released).

    As for whether ORACLE_SGA_PGSZ and lock_sga=TRUE invoking the same code path … The reason I think so is because the pattern of shm*() functions being called is exactly the same in both cases. Granted, to use large pages in AIX, not much needs to be done, essentially, the only thing that is needed is:

    /* Mandatory, to set (16Mb) page size */
    shmctl(, SHM_PAGESIZE,)
    /* Optional, to “pin” the pages.
        The reason this operation is optional is because
        16M pages are pinned anyway when used.
       The effect of SHM_LOCK here is to “use” them all very quickly */

    shmctl(, SHM_LOCK,)

    While I do not have access to ORACLE code, based on the way ORACLE behaves, I think the (pseudo) code must look something like:

    shmget(); /* Create SGA segment */

    IF lock_sga == TRUE AND memory_target IS SET THEN
      Error(‘Cannot set both memory_target and lock_sga”);
    END IF



    /* Set SGA page size:
       Either we set page size explicitly (by ORACLE_SGA_PGSZ)
       OR we get a list of page sizes available in the system
       OR if lock_sga=FALSE we set the “default” page size */
    ListPageSizes = (ORACLE_SGA_PGSZ IS SET) ?
      ORACLE_SGA_PGSZ : (lock_sga == TRUE) ?
        get_available_page_sizes() : get_default_page_size();

    IF lock_sga == TRUE OR ORACLE_SGA_PGSZ=16M THEN
      FOREACH Ps IN (ListPageSizes) LOOP
        EXIT IF shmctl(…, SHM_PAGESIZE, {Ps}) == SUCCESS;
      END LOOP:

      IF ALL shmctl(…, SHM_PAGESIZE, …) FAILED THEN
        Error(‘Not enough memory’);
      END IF;

      /* Lock SGA */
      shmctl(…, SHM_LOCK, …);

    END IF;

    shmat(); /* Attach SGA */

    Cheers,
    Maxym Kharchenko

  11. Hi,

    As far as I can tell from the IBM documentation, there is no need to explicitly configure large pages on AIX 6.1 with POWER6 and higher:

    sg247821, p47:
    “Starting with AIX6.1 on a POWER6-based processors the Virtual Memory Manager (VMM) can dynamically promote pages to a larger page size. This page promotion (4k to 64k) is completely transparent to the application and will be done without the need of user intervention.To see how many large pages are in use on your system, use the vmstat -l command.”

    I would love to hear if this is what you experienced when using Oracle on AIX 6.1, and whether there are any other considerations to factor in at all.

    Thanks,
    Niel

  12. Maxym ,
    could You please describe how exactly truss was used to trace ‘start of database’ process.
    I believe that the pid is unknown before process starts , so how to truss such events like start database instance ?
    Regards
    Greg

  13. Greg,

    The pid of the starting process is easy to determine if you start the database manually by sqlplus. In this case, ORACLE (dedicated) server that is attached to sqlplus will be the process that will do most of the work when starting the database. In particular, it will allocate and lock shared memory segment for SGA.

    To simplify things further before the instance is started that dedicated server will be the only ORACLE server running from that instance, so it will be easy to spot.

  14. Hello Niel,

    You are correct, in AIX 6.1 (and to some degree in AIX 5.3), OS can promote small 4k pages to larger 64K pages and this is done transparently to programs and does not require any intervention from the system administrator. We have definitely seen AIX use 64k pages for our programs and shared segments and (AIX6.1) collapsing existing 4k memory regions into 64k memory regions.

    Except … 64k are not really large pages these days 🙂 By ‘large pages’ I mean 16M and 16G sized pages. These do require separate administration steps to enable and use them and have some different usage characteristics – i.e. whey are not pageable.

    As an example, this is the snapshot of one of our systems where we have all 3 types of pages: small (4k), medium (64k) and large (16M):

    AIX> vmstat -P all
    pgsz            memory                           page
    ----- -------------------------- ------------------------------------
               siz      avm      fre    re    pi    po    fr     sr    cy
       4K  2846384  1808797   264054     0     0     0   598   1118     0
      64K   296469   294202     2547     0     0     0     0      0     0
      16M      131        0      131     0     0     0     0      0     0
  15. Maxym,

    sorry for getting back to you so late with this – I was on vacation and then quite busy with other tasks.

    First of all thank you again for this detailed analysis… I haven’t thought of the interesting side effects regarding dynamic resizing.

    As you’ve pointed out the problems that the client encountered were obviously not related to that setting, but we could convince them at least to switch to a “supported” configuration using LOCK_SGA and no environment variable.

    Thanks again and keep up the good work!

    Randolf

  16. Randolf, Thanks.

    Your questions prompted me to dig dipper into AIX internals and learn quite a few new things … Keep’em coming … 🙂

  17. Has anyone determined what size of SGA is considered the break point for changing the block size from 64K to 16M? I am looking at setting up a database with 16 GB of RAM and wonder which block size is best.

    Thanks,
    Scott

  18. Scott,

    Interesting question, which, unfortunately, I do not have a good answer for …

    My gut tells me that for 16 Gb, the main difference will not be in “page performance” but rather in “quality” or the fact that “large page” SGA is unpageable, but I do not have a firm numbers to prove that (still, this is a very interesting question …)

    As a side note, we have 16 Gb SGA both on “large pages” and regular and they both work fine for what they are used for …

  19. I found this thread very intriguing and wanted to ask a related memory question which lead me to finding this thread. I have an IBM UNIX server (6.1.0.0) running 8 small databases, 5 with SGAs of 3GB, +ASM using 2.8GB and 2 using 1.6GB. Total memory on the server is as follows:
    total memory = 31457 MB
    free memory = 2557 MB
    used memory = 28900 MB
    All databases are running 10.2.0.5.
    Whenever I attempt to start a sandbox DBA database which is confiured with an SGA_target of 1.5GB, I get the following error:
    ORA-27123: unable to attach to shared memory segment
    IBM AIX RISC System/6000 Error: 22: Invalid argument
    Additional information: 1
    Additional information: 188743695
    Someone had suggested checking large page pool for oracle user but I think the problem may be either related to SHMMAX.
    Thoughts/Suggestions…..

  20. Great article! Thanks for taking time to share.

  21. Great Site – Can’t believe I haven’t stumbled onto this before. Keep up the good work!

    Chris

  22. Thanks, Chris.

  23. I do not understand why in my system only shows 4k and 16M support. It does not shows 64k support.

    $ps -Z
    PID TTY TIME DPGSZ SPGSZ TPGSZ SHMPGSZ CMD
    8716550 pts/0 0:00 4K 4K 4K 4K ps
    2294484 pts/0 0:00 4K 4K 4K 4K ksh
    $vmo
    ksh: vmo: 0403-006 Execute permission denied.

    And I am getting a lot of ‘out of memory errors’

    P383 : ORA-04030: out of process memory when trying to allocate 184 bytes (TCHK^ad6ad6d
    Notes:
    P383 : ORA-04030: out of process memory when trying to allocate 184 bytes (TCHK^ad6ad6d6,qksqb.c.kgght) (Event Log File Line)

    Can you help me how to enable 16k support? And, if I configure large pages will it solve the problem?

  24. The OS is AIX 6.1L
    $uname -a
    AIX ddcdxpmhdb02 1 6 00F679FB4C00

  25. and Oracle is 11.2.0.2.0

  26. Hi:
    It’s strange in my test (AIX 5.3 + oracle 10.2.0.5)
    when I use the parameter pre_page_sga=true to startup a oracle instance, the SGA was not allocated totally, It seems 1 page (4KB) missing in SGA…

    cntzunxb02:/tmp$cat /tmp/inittest.ora
    audit_file_dest=’/oracle_tb/app/oracle/admin/test’
    background_dump_dest=’/oracle_tb/app/oracle/admin/test’
    core_dump_dest=’/oracle_tb/app/oracle/admin/test’
    user_dump_dest=’/oracle_tb/app/oracle/admin/test’
    db_name=’test’
    lock_sga=false
    pre_page_sga=true
    sga_max_size=367001600
    sga_target=314572800
    cntzunxb02:/tmp$sqlplus ” / as sysdba”

    SQL*Plus: Release 10.2.0.5.0 – Production on Tue Apr 16 11:25:51 2013

    Copyright © 1982, 2010, Oracle. All Rights Reserved.

    Connected to an idle instance.

    SQL> startup nomount pfile=’/tmp/inittest.ora’
    ORACLE instance started.

    Total System Global Area 369098752 bytes
    Fixed Size 2096512 bytes
    Variable Size 150995584 bytes
    Database Buffers 209715200 bytes
    Redo Buffers 6291456 bytes
    SQL> exit
    Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    cntzunxb02:/tmp$$ORACLE_HOME/bin/sysresv -d on -l test

    IPC Resources for ORACLE_SID “test” :
    Shared Memory:
    ID KEY
    2097157 0x242a8b40
    Oracle Instance alive for sid “test”
    cntzunxb02:/tmp$ipcs -bmS1|grep 0x242a8b40
    m 2097157 0x242a8b40 –rw-r—– oracle oinstall 369106944 0x82112 0x6230e
    cntzunxb02:/tmp$svmon -S $(ipcs -bmS1 | grep 0x242a8b40 | perl -pe “s/(\S+\s+){7,7}//”)

    Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
    82112 70000001 work default shmat/mmap s 65536 0 0 65536
    6230e 70000002 work default shmat/mmap s 24577 0 0 24577

    The SGA size in ipcs output is 369106944, but the output of svmon is (65536+24577)*4*1024=369102848
    1 memory page (369106944-369102848=4KB) was missing…

    Why is it happened?

    Thanks.

  27. Hello,

    This is a great discussion – but according to oracle documentation, Hugepages are only supported for linux kernel. Are they at all supported for AIX based systems?

    See:
    http://docs.oracle.com/cd/E16655_01/server.121/e17615/refrn10320.htm

  28. Hello Pascal,
    It is true “huge pages” are only supported on Linux, but similar infrastructure (pages, sized in megabytes or gigabytes) exists on every modern OS. I.e on AIX they are called “large pages”.

  29. Thanks for the very interesting article (and the whole series as well). I have noticed that is perfectly understandable even if one skips the os commands – that means that your communication skills are excellent, and that you did put a lot of effort in this … so many thanks 🙂

  30. Thanks, Alberto. I’m glad that you found my articles useful.

    Cheers,
    Maxym Kharchenko

  31. Great post, thanks Maxym. I am googling AIX 64K page usage for Oracle, and landing here. While I do see “sm” in my svmon output. And as you explain, it accounts for the collapsed 4K pages into 64K page. Could you please advise if a 64K page can be converted back to 16 × 4K pages? If true, what state of the 64K page can be in — specifically, can it be a pin 64K page? I doubt it as “pin” should mean that page is stuck in physical memory (64K page) regoin, but can’t reach a definite answer.

  32. Hello Ming, unfortunately I can’t comment on the current state of affairs as I haven’t seen AIX systems since 2010. Back then, I do not believe you could uncollapse 64k page onse it was made


Leave a comment

No trackbacks yet.