How ORACLE Uses Memory on AIX. Part 3: Locking SGA
In the previous post we discussed how ORACLE allocates shared memory for SGA in AIX and one of the conclusions was that AIX does not give all the requested memory to ORACLE instance right away but merely promises it.
While this technique allows to use memory more efficiently and you (at least temporarily), can request more memory for processes and shared segments than what AIX physically has, it also has a rather unpleasant consequence – when we get to the limit of physical memory, AIX will have no choice but to start paging memory.
Paging is not necessarily a bad thing – moving older and not-so-often used data out of memory is something that will be done rather routinely – this is how AIX keeps a healthy system. However, when SGA memory starts to page out (and, more importantly, page back in) things can go bad quickly as, well, ORACLE does not really expect SGA to be a disk based area … (ORACLE would have called it “SDA” if that was the case
)
You probably know that in the vast majority of configurations, it is strongly advised to size SGA so that it fits entirely into physical memory and never pages out. The question becomes: how can we accomplish that on AIX?
Pinning ORACLE SGA into AIX Memory
It turns out that there are several ways to pin ORACLE SGA into AIX memory, some of them ORACLE-driven, some AIX-driven and a combination of both …
First of all, let’s look at what ORACLE offers.
We will start by checking ORACLE sga-related parameters:
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
lock_sga BOOLEAN FALSE
pre_page_sga BOOLEAN FALSE
sga_max_size big INTEGER 8G
sga_target big INTEGER 8G
The first two parameters (lock_sga and pre_page_sga) look promising and, in fact, they can be used to control how SGA memory is allocated.
Let’s look at pre_page_sga first.
Controlling memory allocation with pre_page_sga
According to ORACLE documentation, when pre_page_sga is set to true “every process that starts must access every page in SGA“. Obviously, when this happens, the entire SGA memory is used and thus allocated.
Let’s see for ourselves. Before we even begin, let’s remind ourselves how memory is allocated when this parameter is NOT set.
As you can see, in the beginning, most of the memory is under allocated (AIX promised it but did not yet deliver) as not all of the memory has been used.
After setting pre_page_sga to TRUE and restarting the database the picture changes:
Notice that all segments are allocated to the MAX - that is the result of instance processes reading and touching all the memory pages during startup. This obviously has a direct effect on the time it takes to start up the database – in my environment it took ~ 40 seconds (compared to ~ 12 seconds with default settings) for a 4Gb SGA. Presumably, however, this additional time has not been wasted – all further requests to SGA memory are supposed to hit real physical memory and AIX will not need to do any additional allocations.
Still, there are two problems with this approach:
- Notice that the memory, although fully allocated, is NOT really pinned. That means that if AIX starts experiencing memory shortages, you can bet that it will start paging SGA memory out with all the unpleasant consequences.
- A somewhat unexpected consequence is that it now takes more time for any ORACLE process to start as the “touching” it is not done just during instance startup – it is happening for any new ORACLE process (i.e. dedicated server). In my environment, average database connection time went from ~ 0.2 second to ~ 0.8 second, a 4 time increase.
Given these downsides, it is really hard to find a good justification for using pre_page_sga to “load ORACLE memory in memory”. I’m guessing this parameter is probably a relict of the past, or, perhaps, a way to pre-load memory for systems that do not support real memory pinning (remember that ORACLE can run on many operating systems). But in modern AIX, I just do not see how it can be effectively used.
So, let’s move on to the next parameter – lock_sga
Controlling memory allocation with lock_sga
When lock_sga is set to true, ORACLE (based on what truss output shows), runs this additional command on a (global – think ipcs) shared memory segment:
which pins shared memory region into physical memory.
Let’s see how it works. After setting lock_sga=true, and restarting the database, here is what I see:
Notice, that memory is not only allocated fully, but is also pinned and this is really what we want to achieve. The database startup still takes more time than without this parameter (on my system, ~ 34 seconds compared to ~ 12 seconds, again, for a 4Gb SGA), but normal database connections do not suffer any longer as, beyond startup, ORACLE processes do not need to do any (major) extra stuff.
One note here: Many ORACLE documentation sources recommend to also set v_pinshm AIX vmo parameter to enable memory pinning as in:
However this is no longer required, unless you are dealing with a really old version of ORACLE.
With versions up to 9i, ORACLE used a different call for memory pinning:
which required that v_pinshm is also set. As I mentioned, in 10g and beyond ORACLE uses:
that completely ignores v_pinshm settings (special thanks to Leszek Leszczynski for researching this in detail). In my tests, ORACLE 10g/11g memory was pinned regardless of the value of v_pinshm. You can of course, still set it if you need it for ORACLE 9i or for other applications.
In any case, looks like setting lock_sga=true (and, v_pinshm=1, if needed) solves the problem of SGA pinning to our satisfaction – memory is pinned and everybody is happy.
But I would submit that for larger SGAs (and what SGA these days is NOT large?
) there is an even better way to work with AIX memory and that is – using AIX large memory pages.
Using AIX large memory pages
Before discussing how to use large memory pages in AIX, let’s step back a little and discuss what exactly these pages are and what kind of pages we can allocate.
As I mentioned already, memory in AIX is controlled by a Virtual Memory Manager that divides (virtual) memory into chunks (or pages) of equal size. Traditionally, these chunks have always had a mandatory size of 4K, but recently, with the advent of machines that can handle large amounts of RAM, this started to change.
AIX on a modern hardware now supports 4 different page sizes: 4K, 64K, 16M and 16G and I outlined the difference between them in the table below:
| Page size | Svmon symbol | Configuration | Pageable | How to configure | How to use |
| 4K | s | Traditional, Automatic | YES | N/A | By default |
| 64K | m | Automatic | YES | N/A | By default |
| 16M | L | Manual | NO | vmo -p -o lgpg_regions=2048 lgpg_size=16777216 | chuser capabilities=
CAP_BYPASS_RAC_VMM, CAP_PROPAGATE oracle lock_sga=TRUE |
| 16G | L | Manual | NO | vmo -p -o lgpg_regions=10 lgpg_size=17179869184 | chuser capabilities=
CAP_BYPASS_RAC_VMM, CAP_PROPAGATE oracle lock_sga=TRUE |
As you can see, 4K pages are still there and still default, but AIX has now also added new “default” 64K pages. Default in this context means that you do not need to do anything special to either enable these pages or use them – AIX will decide when to use 64K pages instead of 4K and this will be done completely transparently to programs (including ORACLE, of course). In fact, in modern hardware, you would most likely see 64K pages used by ORACLE SGA as (a large) SGA size will definitely warrant them.
There is also an interesting development with AIX 6.1. While AIX 5.3 can allocate 64K pages from the start, AIX 6.1 can take existing 4K memory regions and see if they can be “collapsed” from 4K to 64K pages. svmon will show “collapsed” regions as sm.
But back to memory pages. Beyond medium (64K) pages, AIX also allows to use even larger pages – 16M or 16G. However, there are two important differences here:
- Large pages are NOT available by default. They require extra steps to enable them and (separately) to use them
- Large pages are NOT pageable. Once allocated, they always stay in memory and cannot be paged in or out (which is probably a good thing, but you do need to pay special attention to how you size them).
In addition to that, not all AIX hardware will support larger pages. To see if your particular hardware supports them run:
4096
65536
16777216
17179869184
The one problem with large pages is that they are somewhat cumbersome to use.
First of all, you have to pre calculate the “large page memory” size and explicitly set it with the VMM (this will take memory away from regular VMM operations and designate it to “large page” region).
AIX> bosboot -ad /dev/hdisk0; bosboot -ad /dev/hdisk1; bosboot –ad /dev/ipldevice
Personally, I do not see it as a major issue as you have to assign a specific size for your SGA anyway, albeit now you will have to do it on 2 levels: ORACLE and AIX.
Second, even then allocated, large pages cannot be used unless you allow user to skip regular VMM allocation policies with this command:
which, again, in my mind is only a minor nuisance.
(Of course, there are also bugs … in particular, ORACLE 10.2.0.4 will not use AIX large pages even if all settings are made, unless one-off patch: 7226548 is applied … But I digress …)
Anyway, now we know what large pages are, but why exactly do we want to use them?
Well, for larger SGA sizes the benefits should be fairly obviously: making page sizes larger reduces the number of pages that AIX has to manage and that makes managing memory more efficient. Think about this: for a 30 Gb SGA (which is not excessively big these days …), the number of pages is reduced from 7,864,320 (for 4K pages) or 491,520 (for 64K pages) to 1,920 if we switch to 16M pages and that is, indeed, quite a savings …
I.e. look at how this reduction affects database startup time (test results from one of my systems):
- the 30 Gb SGA database started in ~ 6 seconds with default settings (but remember that memory is not really fully allocated)
- lock_sga=TRUE with 64K pages changed that startup time to ~ 35 seconds
- lock_sga=TRUE + large (16M) pages drove the startup time back to ~ 6 seconds
On top of that, once you set up large pages, you effectively shielded this memory from the rest of the system – it will not be paged out or affected by regular memory operations, which is, ultimately, what you want to achieve in most cases.
Finally, once allocated, how will “large page” memory be reported by svmon? Well, see for yourself:
This would normally conclude AIX memory story, if not for one thing – ORACLE 11g made a major changes in this area, making SGA, in addition to PGA much more dynamic…
In the next post we are going to have some fun with AIX and ORACLE 11g memory_targets.




July 19th, 2010 - 11:10
hi , how i can know the page size currently on mmy system aix ?
thnks
July 19th, 2010 - 11:20
By default, AIX page size is: 4k. It can grow to 64k if AIX deems necessary. Large pages (16M or 16G) need to be enabled explicitly (that is: manually).
To find out what page sizes your system supports, run pagesize -a
I.e.:
pagesize -a
4096
65536
16777216
17179869184
… and this particular system supports 4k, 64k, 16M and 16G pages
To find out what page sizes your program uses, run i.e. svmon -P
I.e. here I’m looking at my shell command:
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
62783490 ksh 32626 8932 0 32550 N N N
PageSize Inuse Pin Pgsp Virtual
s 4 KB 226 4 0 150
m 64 KB 2025 558 0 2025
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
9000 d work shared library text m 1379 0 0 1379
20002 0 work kernel segment m 646 558 0 646
1294628 2 work process private s 120 4 0 120
1340074 1 clnt code,/dev/hd2:3599 s 71 0 – -
119835b f work shared library data s 30 0 0 30
13a3c38 – clnt /dev/fslv00:1993 s 3 0 – -
10e010e – clnt /dev/hd2:20163 s 2 0 – -
In this picture: ‘m’ are 64k pages and ‘s’ are 4k pages … This program uses both.
July 22nd, 2010 - 04:41
Maxym,
great post.
I have some questions: I’ve been told by SA’s and Oracle DBA’s that they managed to have AIX 5.3 use 16 MB Large Pages without setting LOCK_SGA=TRUE.
So all required steps from OS side have been done (all this vmo, chuser stuff) but they don’t set LOCK_SGA=TRUE.
They even claim that 11g AMM (memory_target=xxx) under AIX 5.3 uses the Large Pages configured, however I found this post here:
http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/oracle-is-not-using-large-pages-on-aix-61-2716017
What does this errpt output “ONE OR MORE LARGE PAGES HAS BEEN CONVERTED INTO PAGEABLE PAGES” mean? Can AIX do some “conversion” if memory requested using non-Large Pages is low?
So can it happen that they were under the “impression” to use LargeParges since they seem to have been used but actually were converted to some “pageable pages”?
I also found this post interesting:
http://www.pubbs.net/201003/oracle/8051-problems-with-large-pages-on-aix-after-upgrading-to-102043.html
I have not found a definitive answer to the question regarding LOCK_SGA in the documents I have read so far, so can you confirm that Large Pages can only be used by Oracle if LOCK_SGA=TRUE?
Randolf
July 22nd, 2010 - 19:43
Hello Randolf,
These are some interesting questions that you ask
I’ll start with the first one: Can ORACLE use AIX large pages without lock_sga=TRUE ?
While I cannot confirm with 100% certainty that this is impossible (never say never!), I believe it is extremely unlikely.
The proof is rather in the (ORACLE) code:
I ran a bunch of tests starting ORACLE 10g instance and looking (through truss) at shared memory control operations.
Here are the functions that ORACLE executes:
lock_sga=FALSE
shmctl(..., SHM_PAGESIZE, ...) = 0 /* Set (default) size */
lock_sga=TRUE, but no (or not enough) large pages
shmctl(..., SHM_PAGESIZE, ...) Err#12 ENOMEM /* Tried (large) pages and failed */
shmctl(..., SHM_PAGESIZE, ...) = 0 /* Tried (regular) pages and succeeded */
shmctl(..., SHM_LOCK, ...) = 0 /* Lock SGA */
lock_sga=TRUE, enough large pages
shmctl(..., SHM_PAGESIZE, ...) = 0 /* Tried large pages and succeeded */
shmctl(..., SHM_LOCK, ...) = 0 /* Locked SGA */
The thing here is: when lock_sga=FALSE, it doesn’t matter whether “other stuff” (large pages, user capabilities etc) is set or not – the additional shmctl() functions requesting large pages are simply NOT called !
9i behaves in a slightly different way:
lock_sga=FALSE
lock_sga=TRUE
but the bottom line is still the same. Additional “large page” shmget flags are NOT “added” unless you set lock_sga=TRUE (incidentally, because of different SGA locking technique used by ORACLE 9i, shared segment behaves MUCH differently from 10g/11g, which is likely a root cause for many enduring memory myths … I’ll probably make another post just on that …)
In any case, in the end, I very much doubt that SA/DBAs claim … To me, the only “plausible” way to do it is to make “large pages” default somehow, but I’m rather skeptical that it can be done. If you would like to read more, I found this IBM white paper pretty useful: http://www-03.ibm.com/systems/resources/systems_p_os_aix_whitepapers_multiple_page.pdf
Anyway, without lock_sga=TRUE you cannot have SGA in large pages … Unless, of course, SA/DBAs that you mention did not refer to SGA at all … but rather to process private memory … For the latter, you actually CAN use large pages without lock_sga set … (and it is pretty easy to do: just set LDR_CNTRL environment variable to i.e.
AIX> export LDR_CNTRL=DATAPSIZE=16M@TEXTPSIZE=16M@STACKPSIZE=4k
before starting the instance and process data/text segments should be allocated in large pages …
As for your second question: Can large pages be converted back to “pageable” ? – I have never seen this happening (I’m guessing you have to
). On the other hand Can this “conversion” make an “illusion” of using large pages?” – I would guess NOT - this conversion seems to be a one way street: from large to small and you still have to set lock_sga=TRUE to get to large pages in the first place.
be very close to maxpin% to invoke that behavior), but I suppose it is possible (and errpt message looks legit
Of course, I can turn this question around and say that it is very possible to configure everything for large pages and NOT use them (I guess that might qualify as “being under impression”
) In fact, we just had that case recently when we upgraded from 10.2.0.4 to 10.2.0.5, which removed a critical “AIX large page” oneoff patch. So, when we restarted the instance after upgrade, we were “blissfully unaware”
that our instance allocated a small page SGA … Well, for a few hours, anyway …
And finally, 11g memory_target and large pages – this is the hardest to believe, honestly … They seem to be designed for completely opposite purposes: LARGE_PAGES are ENFORCING that SGA is never released, while MEMORY_TARGET is making SGA play nice and decrease in size when necessary …
Cheers,
Maxym Kharchenko
July 23rd, 2010 - 15:49
Maxym,
great reply, thanks a lot for your time and the detailed explanation.
I forgot to mention that this is about Oracle 11.1.0.7 – so any thoughts if above behaviour regarding the shmctl() calls with LargePages request could be different there (I doubt it myself but may be you’ve done the same analysis with 11.1.0.7 by any chance)?
Also thanks for the reminder that it might be that they see LargePages being consumed but not by the SGA but PGA - as I understand this would be possible with a corresponding environment variable setting of LDR_CNTRL.
What I meant by the “illusion” was something like this: Let’s pretend the system was under memory pressure and performed such a conversion – would it be possible to have the corresponding tools (like “svmon”) show that “LargePages” have been used but actually they have been used to be converted to “pageable pages”?
Cheers,
Randolf
July 23rd, 2010 - 18:31
Randolf,
No problem, I’m enjoying it
Incidentally, I have shm* function traces for 11.1.0.7 database – they are essentially the same as in 10g with one minor difference – in 11.1.0.7 ORACLE adds one additional (permission) flag: S_IWGRP when shmget() is called … but this should hardly matter here. As far as shm* behavior is concerned 10.2 and 11.1 are exactly the same …
“Conversion” question is much harder, of course … I was able to reproduce the error, but it was under very unusual circumstances and only for “unused” memory:
When you allocate large pages, AIX gives them to you gradually and you can monitor this process if you run say vmstat -P 16m 1. When you get close to maxpin%, the large page allocation command usually fails before allocating “the complete” request with message similar to:
vmo: 1485-124 Error setting lgpg_size lgpg_regions
The system was unable to completely satisfy the request
In other words, you might request say 128 pages (with maxpin% limit being at 192) and vmo command will fail while allocating, say, only 95. But the curious thing is that when you try to repeat the same command it will sometimes succeed and give you a few more pages.
I ran page allocation command (vmo -o lgpg_regions=128 -o lgpg_size=…) in a loop and was able to gradually move from 95 to 108 pages. Then another interesting thing happened: looks like 108 was an equilibrium point for my system as I was able to move to 109, but then the system would spontaneously jump back to 108, releasing one large page:
pgsz memory page
—– ————————– ————————————
siz avm fre re pi po fr sr cy
16M 109 108 1 0 0 0 0 0 0
16M 109 108 1 0 0 0 0 0 0
16M 109 108 1 0 0 0 0 0 0
16M 109 108 1 0 0 0 0 0 0
16M 108 108 0 0 0 0 0 0 0
16M 108 108 0 0 0 0 0 0 0
16M 108 108 0 0 0 0 0 0 0
16M 109 108 1 0 0 0 0 0 0
16M 109 108 1 0 0 0 0 0 0
Every such “jump back” would display the errpt “conversion” message that you mentioned … The “jump backs” were actually pretty consistent in my case, happening ~ every minute or so, but always involving only 1 (last?) page.
However, notice that at the time of “jump” all the pages were although allocated, but still “free”. Once I actually “used” the 109 large pages by the test program – the jumps stopped (and I collected stats for something like an hour).
I tried also to reduce maxpin% and lgpg_regions on allocated and used large page shared segment (also in the tight loop), but to no avail – looks like once the large pages are used, AIX does not give them away … and so, I’m rather doubtful that (allocated and “used”) large page SGA can “loose” pages to conversion …
But of course, this might still be possible …
Just my 2c …
Regards,
Maxym Kharchenko
July 24th, 2010 - 15:49
Maxym,
thanks again for your detailed reply. My question regarding the “Large Pages” conversion was more about this scenario:
- LargePages configured via VMO etc.
- But Oracle is not using LOCK_SGA=TRUE which very likely means that Oracle does not attempt to use the pre-configured LargePages (at least for the SGA, but see below ORACLE_SGA_PGSZ variable)
- Now Oracle uses “conventional” memory but LargePages memory has been configured and put aside and therefore the system eventually is under memory pressure
- Can now this conversion happen to satisfy the “conventional” memory demand from Oracle?
Regarding the particular case: I’m now one step further: I’ve got confirmation that they set the environment variable ORACLE_SGA_PGSZ=16m (which is described as workaround for bug 7226548 – Large Pages not being used in 10.2.0.4) before starting the Oracle instance.
So is it possible that by using that “workaround” it is possible to use LargePages for the SGA without using LOCK_SGA=TRUE?
I still have to ask why they don’t simply set LOCK_SGA=TRUE, but I’m curious if this workaround allows to use LargePages for the SGA without setting LOCK_SGA=TRUE – and in particular what happens when using AMM with this setting??
By the way: They don’t use LDR_CNTRL for the text and data segments.
Randolf
July 24th, 2010 - 21:28
Randolf,
I just tested it and you’ve been right on both counts.
1. With ORACLE_SGA_PGSZ=16m ORACLE DOES use large pages without lock_sga=TRUE. Looks like this variable triggers the same code path in the database as both
shmctl(…, SHM_PAGESIZE, …) /* 16 Mb page */
shmctl(…, SHM_LOCK, …)
are called and, of course, SGA is using large pages after that … (and I agree with you, if it has the same effect as lock_sga, why not simply use lock_sga?)
2. Allocated (but not used) large pages DO get released when the system is under memory pressure. Allocated and USED large pages do NOT.
3. But the most interesting thing happened with ORACLE_SGA_PGSZ=16m and memory_target set. The database starts! and (large page!) SGA is allocated with the full memory_target size. I’m not sure that’s what ORACLE intended to happen (based on the fact that they do not allow lock_sga and memory_target to be both set) but it is curious situation indeed …
I’ll research it in some more detail once I get back from a short vacation …
I guess, you learn something new everyday … and my apologies for misleading you a bit initially.
Regards,
Maxym Kharchenko
July 25th, 2010 - 15:16
Maxym,
thank you very much for the prompt reply and taking the time to test this odd scenario…
I suspect that the reason for not using LOCK_SGA is that they use this “trick” also when running with AMM - and since AMM doesn’t work with LOCK_SGA the question is now: Does using the environment variable really mean the same (in terms of code paths) as using LOCK_SGA… I’m not really convinced.
So this “workaround” in form of the environment variable seems to allow using LargePages in a way it wasn’t intended by Oracle – it would be interesting to know if this could have some side effects, in particular when using it together with AMM (after all now the PGA part of the memory_target seems to be allocated from LargePages as well…).
The client has a new 750 model with P7 cores running under AIX 6.1 and we get some odd latch contention in the database that doesn’t show up when running the same database and load with P6 cores under AIX 5.3.
Theoretically the P7 cores should have a lot more CPU power than the P6 cores – still it looks like we have a CPU starvation issue but no-one can explain it so far. May be this configuration using the environment variable (but no AMM, simply SGA_TARGET + PGA_AGGREGATE_TARGET) has some side effects under AIX 6.1, but of course I can’t prove it yet – however in the beginning the new system with P7 under AIX 6.1 was run with AMM (and this environment variable) and they got weird ORA-4030/4031 errors which let them turn off AMM. I don’t have hands on the exact error messages but it made me curious why they got these errors when the same (odd) configuration (same amount of memory, only difference is AIX 6.1 and P7 cores) was running without any memory errors under AIX 5.3 with P6 cores.
Have a nice vacation – and if you find some time afterwards for more research… I’m all ears
By the way: You confirmed above “2. Allocated (but not used) large pages DO get released when the system is under memory pressure”. How does this look like in tools like SVMON? Does it show LargeParges as being in use (although being converted / released) or is it clear from the output that the LargePages have been released / converted?
Cheers,
Randolf
July 29th, 2010 - 14:27
Randolf,
Thanks, the vacation was nice (but very short) …
Anyway, I’ve run some limited testing for memory_target with and without ORACLE_SGA_PGSZ set and what seems to happen is:
Without ORACLE_SGA_PGSZ set:
ORACLE PGA and (more importantly) SGA can be sized up and down, depending on workload (or forced parameters) and their sum roughly totals memory_target.
I.e. this is what SGA memory (memory_target=2000M, sga_target=1500M, pga_aggregate_target=200M) looks like some ‘filler’ workload ran in it:
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
11c3519 m 0.00 0.00 0.00 0.00
117d813 m 11.25 0.00 0.00 11.25
11e60dd m 97.69 0.00 0.00 97.69
1124a13 m 118.50 0.00 0.00 118.50
112c990 m 256.00 0.00 0.00 256.00
10b428e m 256.00 0.00 0.00 256.00
13b0d3e m 256.00 0.00 0.00 256.00
11fdedd m 256.00 0.00 0.00 256.00
---------- -- ---------- ---------- ---------- ----------
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
TOTAL: 1251.44 0.00 0.00 1251.44
Notice that ORACLE allocates all 8 shared segments for SGA (up to memory_target), so that SGA can potentially reach the state where SGA:100% and PGA: 0, but, of course, this is not likely to happen.
Now, if we force SGA to resize:
ALTER system SET pga_aggregate_target=1200M;
then SGA memory will look like this:
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
11c3519 m 0.00 0.00 0.00 0.00
13b0d3e m 0.00 0.00 0.00 0.00
112c990 m 0.00 0.00 0.00 0.00
11e60dd m 1.69 0.00 0.00 1.69
117d813 m 11.25 0.00 0.00 11.25
1124a13 m 119.12 0.00 0.00 119.12
11fdedd m 160.00 0.00 0.00 160.00
10b428e m 256.00 0.00 0.00 256.00
---------- -- ---------- ---------- ---------- ----------
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
TOTAL: 548.06 0.00 0.00 548.06
Notice that ORACLE deallocated some of the SGA memory to give space to PGA and this is expected behavior.
Now let’s test what happens when we use memory_target with ORACLE_SGA_PGSZ=16M
When we start the database, its SGA memory looks like this:
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
10ffece L 224.00 224.00 0.00 224.00
1053885 L 256.00 256.00 0.00 256.00
1355874 L 256.00 256.00 0.00 256.00
119bd9b L 256.00 256.00 0.00 256.00
1077c43 L 256.00 256.00 0.00 256.00
121f3e5 L 256.00 256.00 0.00 256.00
117d313 L 256.00 256.00 0.00 256.00
1300472 L 256.00 256.00 0.00 256.00
---------- -- ---------- ---------- ---------- ----------
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
TOTAL: 2016.00 2016.00 0.00 2016.00
Yes, it is using large pages but it is also allocated to the MAX (that is completely taking what is set in memory_target). Can this memory be released by resetting SGA and PGA as in previous example?
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
10ffece L 224.00 224.00 0.00 224.00
1053885 L 256.00 256.00 0.00 256.00
1355874 L 256.00 256.00 0.00 256.00
119bd9b L 256.00 256.00 0.00 256.00
1077c43 L 256.00 256.00 0.00 256.00
121f3e5 L 256.00 256.00 0.00 256.00
117d313 L 256.00 256.00 0.00 256.00
1300472 L 256.00 256.00 0.00 256.00
---------- -- ---------- ---------- ---------- ----------
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
TOTAL: 2016.00 2016.00 0.00 2016.00
Well, not really – SGA allocations stands firm and does not give a shred of memory back (even though SGA size is reported as ~500M in v$memory_dynamic_components) … Notice that PGA memory is NOT allocated from this (memory_target) pool and also, unless LDR_CNTRL is used, is still allocated from “small pages”:
PGA:
---------- ---------- ---------- ---------- ------------------------
Vsid InMem Paging Virtual COMMAND
---------- ---------- ---------- ---------- ------------------------
…
16253366 114.27 0.00 114.27 oracletest11
8192058 114.29 0.00 114.29 oracletest11
16122250 114.30 0.00 114.30 oracletest11
38011016 114.38 0.00 114.38 oracletest11
…
48496846 114.69 0.00 114.69 oracletest11
59703300 114.72 0.00 114.72 oracletest11
-1 200.16 0.00 0.00 TEXT SEGMENT
---------- ---------- ---------- ---------- ------------------------
Vsid InMem Paging Virtual COMMAND
---------- ---------- ---------- ---------- ------------------------
TOTAL: 1443.05 0.00 1242.89 Processes: 29
SGA:
---------- -- ---------- ---------- ---------- ----------
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
10ffece L 224.00 224.00 0.00 224.00
1053885 L 256.00 256.00 0.00 256.00
1355874 L 256.00 256.00 0.00 256.00
…
---------- -- ---------- ---------- ---------- ----------
Vsid Pg InMem Pin Paging Virtual
---------- -- ---------- ---------- ---------- ----------
TOTAL: 2016.00 2016.00 0.00 2016.00
# Looking at one of the “dedicated server” processes:
AIX>svmon -P 52756618
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB
52756618 oracle 635139 525328 208 583967 Y N Y
PageSize Inuse Pin Pgsp Virtual
s 4 KB 88579 256 0 37295
m 64 KB 1904 561 13 1911
L 16 MB 126 126 0 126
Vsid Esid Type Description PSize Inuse Pin Pgsp Virtual
116e314 70000001 work default shmat/mmap L 16 16 0 16 <- SGA (large pages!)
1119515 70000000 work default shmat/mmap L 16 16 0 16
102cac6 70000002 work default shmat/mmap L 16 16 0 16
10142c2 70000007 work default shmat/mmap L 14 14 0 14 <- SGA (large pages!)
102a6c2 10 clnt text data BSS heap, s 51240 0 - -
/dev/fslv00:216105
12d362c 11 work text data BSS heap s 33039 256 0 33039 <- PGA (small pages!)
…
139313b 9001000a work shared library data s 107 0 0 107 <- PGA (small pages!)
…
So, the bottom line is that with memory_target and ORACLE_SGA_PGSZ, ORACLE essentially goes back almost to 10g memory model where SGA and PGA are separate (memory_target here is essentially what sga_target was for 10g).
I ran this configuration with some relatively heavy load and apart from the fact that ORACLE used MORE memory than what memory_target prescribed, I did not notice any other problems, certainly not ORA-04031 that you’ve seen … Must have been something else in your case …
Now, for the question, how did I see that unallocated large pages were released?. Well, I did not see it in svmon, because, these pages were still “free” (that is: not attached to any process or shared segment) and it is kind of hard to look for “free” pages in svmon. Here is what I did instead:
1. I allocated a large page region with vmo -o lgpg_regions=130 -o lgpg_size=16…, which at this time became full of “free” 16M pages. vmstat -P 16m 2 would show these pages as having been allocated, but not used.
2. I then created “large page” shared segment (using this simple test program) which allocated 100 out of 130 pages (that is: 100 pages “used”, leaving 30 pages “free”). Once again, you could clearly see that in vmstat.
3. I then created a memory pressure in the system by running the same test program to allocate and lock another (small page this time) shared memory region to the point that the system was constantly paging.
4. At this time, I noticed (vmstat -P 16m 2) that the number of “large” pages in the system shrank from 130 to 100, eating up all the allocated but not used large pages. “LARGE PAGES CONVERTED …” errpt message was displayed at the same time.
5. The amount of large pages in the system (as evidenced by vmstat) never went beyond 100 pages which were used, so I must conclude that unused large pages are relatively easy converted and given back to the system, while used large pages do NOT get converted back (or, at the very least, are very difficult to convert back – I ran this test until the system was thrashing and 100 used large pages were never released).
As for whether ORACLE_SGA_PGSZ and lock_sga=TRUE invoking the same code path … The reason I think so is because the pattern of shm*() functions being called is exactly the same in both cases. Granted, to use large pages in AIX, not much needs to be done, essentially, the only thing that is needed is:
shmctl(…, SHM_PAGESIZE, …)
/* Optional, to “pin” the pages.
The reason this operation is optional is because
16M pages are pinned anyway when used.
The effect of SHM_LOCK here is to “use” them all very quickly */
shmctl(…, SHM_LOCK, …)
While I do not have access to ORACLE code, based on the way ORACLE behaves, I think the (pseudo) code must look something like:
IF lock_sga == TRUE AND memory_target IS SET THEN
Error(‘Cannot set both memory_target and lock_sga”);
END IF
…
/* Set SGA page size:
Either we set page size explicitly (by ORACLE_SGA_PGSZ)
OR we get a list of page sizes available in the system
OR if lock_sga=FALSE we set the “default” page size */
ListPageSizes = (ORACLE_SGA_PGSZ IS SET) ?
ORACLE_SGA_PGSZ : (lock_sga == TRUE) ?
get_available_page_sizes() : get_default_page_size();
IF lock_sga == TRUE OR ORACLE_SGA_PGSZ=16M THEN
FOREACH Ps IN (ListPageSizes) LOOP
EXIT IF shmctl(…, SHM_PAGESIZE, {Ps}) == SUCCESS;
END LOOP:
IF ALL shmctl(…, SHM_PAGESIZE, …) FAILED THEN
Error(‘Not enough memory’);
END IF;
/* Lock SGA */
shmctl(…, SHM_LOCK, …);
END IF;
shmat(…); /* Attach SGA */
Cheers,
Maxym Kharchenko
August 12th, 2010 - 01:51
Hi,
As far as I can tell from the IBM documentation, there is no need to explicitly configure large pages on AIX 6.1 with POWER6 and higher:
sg247821, p47:
“Starting with AIX6.1 on a POWER6-based processors the Virtual Memory Manager (VMM) can dynamically promote pages to a larger page size. This page promotion (4k to 64k) is completely transparent to the application and will be done without the need of user intervention.To see how many large pages are in use on your system, use the vmstat -l command.”
I would love to hear if this is what you experienced when using Oracle on AIX 6.1, and whether there are any other considerations to factor in at all.
Thanks,
Niel
August 12th, 2010 - 08:55
Maxym ,
could You please describe how exactly truss was used to trace ‘start of database’ process.
I believe that the pid is unknown before process starts , so how to truss such events like start database instance ?
Regards
Greg
August 12th, 2010 - 09:14
Greg,
The pid of the starting process is easy to determine if you start the database manually by sqlplus. In this case, ORACLE (dedicated) server that is attached to sqlplus will be the process that will do most of the work when starting the database. In particular, it will allocate and lock shared memory segment for SGA.
To simplify things further before the instance is started that dedicated server will be the only ORACLE server running from that instance, so it will be easy to spot.
August 12th, 2010 - 09:28
Hello Niel,
You are correct, in AIX 6.1 (and to some degree in AIX 5.3), OS can promote small 4k pages to larger 64K pages and this is done transparently to programs and does not require any intervention from the system administrator. We have definitely seen AIX use 64k pages for our programs and shared segments and (AIX6.1) collapsing existing 4k memory regions into 64k memory regions.
Except … 64k are not really large pages these days
By ‘large pages’ I mean 16M and 16G sized pages. These do require separate administration steps to enable and use them and have some different usage characteristics – i.e. whey are not pageable.
As an example, this is the snapshot of one of our systems where we have all 3 types of pages: small (4k), medium (64k) and large (16M):
pgsz memory page
----- -------------------------- ------------------------------------
siz avm fre re pi po fr sr cy
4K 2846384 1808797 264054 0 0 0 598 1118 0
64K 296469 294202 2547 0 0 0 0 0 0
16M 131 0 131 0 0 0 0 0 0
September 1st, 2010 - 09:22
Maxym,
sorry for getting back to you so late with this – I was on vacation and then quite busy with other tasks.
First of all thank you again for this detailed analysis… I haven’t thought of the interesting side effects regarding dynamic resizing.
As you’ve pointed out the problems that the client encountered were obviously not related to that setting, but we could convince them at least to switch to a “supported” configuration using LOCK_SGA and no environment variable.
Thanks again and keep up the good work!
Randolf
September 2nd, 2010 - 14:55
Randolf, Thanks.
Your questions prompted me to dig dipper into AIX internals and learn quite a few new things … Keep’em coming …
September 17th, 2010 - 15:59
Has anyone determined what size of SGA is considered the break point for changing the block size from 64K to 16M? I am looking at setting up a database with 16 GB of RAM and wonder which block size is best.
Thanks,
Scott
September 19th, 2010 - 17:34
Scott,
Interesting question, which, unfortunately, I do not have a good answer for …
My gut tells me that for 16 Gb, the main difference will not be in “page performance” but rather in “quality” or the fact that “large page” SGA is unpageable, but I do not have a firm numbers to prove that (still, this is a very interesting question …)
As a side note, we have 16 Gb SGA both on “large pages” and regular and they both work fine for what they are used for …