Skip Navigation Links | |
Exit Print View | |
Programming Interfaces Guide Oracle Solaris 11.1 Information Library |
2. Session Description Protocol API
Verifying the Interface Version
Initializing the Locality Group Interface
Locality Group Characteristics
8. Programming With XTI and TLI
10. Transport Selection and Name-to-Address Mapping
11. Real-time Programming and Administration
This section discusses the APIs used to discover and affect thread and memory placement with respect to lgroups.
The lgrp_home(3LGRP) function is used to discover thread placement.
The meminfo(2) system call is used to discover memory placement.
The MADV_ACCESS flags to the madvise(3C) function are used to affect memory allocation among lgroups.
The lgrp_affinity_set(3LGRP) function can affect thread and memory placement by setting a thread's affinity for a given lgroup.
The affinities of an lgroup may specify an order of preference for lgroups from which to allocate resources.
The kernel needs information about the likely pattern of an application's memory use in order to allocate memory resources efficiently.
The madvise() function and its shared object analogue madv.so.1 provide this information to the kernel.
A running process can gather memory usage information about itself by using the meminfo() system call.
The lgrp_home() function returns the home lgroup for the specified process or thread.
#include <sys/lgrp_user.h> lgrp_id_t lgrp_home(idtype_t idtype, id_t id);
The lgrp_home() function returns EINVAL when the ID type is not valid. The lgrp_home() function returns EPERM when the effective user of the calling process is not the superuser and the real or effective user ID of the calling process does not match the real or effective user ID of one of the threads. The lgrp_home() function returns ESRCH when the specified process or thread is not found.
The madvise() function advises the kernel that a region of user virtual memory in the range starting at the address specified in addr and with length equal to the value of the len parameter is expected to follow a particular pattern of use. The kernel uses this information to optimize the procedure for manipulating and maintaining the resources associated with the specified range. Use of the madvise() function can increase system performance when used by programs that have specific knowledge of their access patterns over memory.
#include <sys/types.h> #include <sys/mman.h> int madvise(caddr_t addr, size_t len, int advice);
The madvise() function provides the following flags to affect how a thread's memory is allocated among lgroups:
This flag resets the kernel's expected access pattern for the specified range to the default.
This flag advises the kernel that the next LWP to touch the specified address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.
This flag advises the kernel that many processes or LWPs will access the specified address range randomly across the system. The kernel allocates the memory and other resources for this range accordingly.
The madvise() function can return the following values:
Some or all of the mappings in the specified address range, from addr to addr+len, are locked for I/O.
The value of the addr parameter is not a multiple of the page size as returned by sysconf(3C), the length of the specified address range is less than or equal to zero, or the advice is invalid.
An I/O error occurs while reading from or writing to the file system.
Addresses in the specified address range are outside the valid range for the address space of a process or the addresses in the specified address range specify one or more pages that are not mapped.
The NFS file handle is stale.
The madv.so.1 shared object enables the selective configuration of virtual memory advice for launched processes and their descendants. To use the shared object, the following string must be present in the environment:
LD_PRELOAD=$LD_PRELOAD:madv.so.1
The madv.so.1 shared object applies memory advice as specified by the value of the MADV environment variable. The MADV environment variable specifies the virtual memory advice to use for all heap, shared memory, and mmap regions in the process address space. This advice is applied to all created processes. The following values of the MADV environment variable affect resource allocation among lgroups:
This value resets the kernel's expected access pattern to the default.
This value advises the kernel that the next LWP to touch an address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.
This value advises the kernel that many processes or LWPs will access memory randomly across the system. The kernel allocates the memory and other resources accordingly.
The value of the MADVCFGFILE environment variable is the name of a text file that contains one or more memory advice configuration entries in the form exec-name:advice-opts.
The value of exec-name is the name of an application or executable. The value of exec-name can be a full pathname, a base name, or a pattern string.
The value of advice-opts is of the form region=advice. The values of advice are the same as the values for the MADV environment variable. Replace region with any of the following legal values:
Advice applies to all heap, shared memory, and mmap(2) regions in the process address space.
The heap is defined to be the brk(2) area. Advice applies to the existing heap and to any additional heap memory allocated in the future.
Advice applies to shared memory segments. See shmat(2) for more information on shared memory operations.
Advice applies to shared memory segments that are using the SHM_SHARE_MMU flag. The ism option takes precedence over shm.
Advice applies to shared memory segments that are using the SHM_PAGEABLE flag. The dsm option takes precedence over shm.
Advice applies to mappings established by the mmap() system call using the MAP_SHARED flag.
Advice applies to mappings established by the mmap() system call using the MAP_PRIVATE flag.
Advice applies to mappings established by the mmap() system call using the MAP_ANON flag. The mapanon option takes precedence when multiple options apply.
The value of the MADVERRFILE environment variable is the name of the path where error messages are logged. In the absence of a MADVERRFILE location, the madv.so.1 shared object logs errors by using syslog(3C) with a LOG_ERR as the severity level and LOG_USER as the facility descriptor.
Memory advice is inherited. A child process has the same advice as its parent. The advice is set back to the system default advice after a call to exec(2) unless a different level of advice is configured using the madv.so.1 shared object. Advice is only applied to mmap() regions explicitly created by the user program. Regions established by the run-time linker or by system libraries that make direct system calls are not affected.
The following examples illustrate specific aspects of the madv.so.1 shared object.
Example 4-2 Setting Advice for a Set of Applications
This configuration applies advice to all ISM segments for applications with exec names that begin with foo.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1 $ MADVCFGFILE=madvcfg $ export LD_PRELOAD MADVCFGFILE $ cat $MADVCFGFILE foo*:ism=access_lwp
Example 4-3 Excluding a Set of Applications From Advice
This configuration sets advice for all applications with the exception of ls.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1 $ MADV=access_many $ MADVCFGFILE=madvcfg $ export LD_PRELOAD MADV MADVCFGFILE $ cat $MADVCFGFILE ls:
Example 4-4 Pattern Matching in a Configuration File
Because the configuration specified in MADVCFGFILE takes precedence over the value set in MADV, specifying * as the exec-name of the last configuration entry is equivalent to setting MADV. This example is equivalent to the previous example.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1 $ MADVCFGFILE=madvcfg $ export LD_PRELOAD MADVCFGFILE $ cat $MADVCFGFILE ls: *:madv=access_many
Example 4-5 Advice for Multiple Regions
This configuration applies one type of advice for mmap() regions and different advice for heap and shared memory regions for applications whose exec() names begin with foo.
$ LD_PRELOAD=$LD_PRELOAD:madv.so.1 $ MADVCFGFILE=madvcfg $ export LD_PRELOAD MADVCFGFILE $ cat $MADVCFGFILE foo*:madv=access_many,heap=sequential,shm=access_lwp
The meminfo() function gives the calling process information about the virtual memory and physical memory that the system has allocated to that process.
#include <sys/types.h> #include <sys/mman.h> int meminfo(const uint64_t inaddr[], int addr_count, const uint_t info_req[], int info_count, uint64_t outdata[], uint_t validity[]);
The meminfo() function can return the following types of information:
The physical memory address corresponding to the given virtual address
The lgroup to which the physical page corresponding to the given virtual address belongs
The size of the physical page corresponding to the given virtual address
The number of replicated physical pages that correspond to the given virtual address
The nth physical replica of the given virtual address
The lgroup to which the nth physical replica of the given virtual address belongs
The lgroup to which the given physical address belongs
The meminfo() function takes the following parameters:
An array of input addresses.
The number of addresses that are passed to meminfo().
An array that lists the types of information that are being requested.
The number of pieces of information that are requested for each address in the inaddr array.
An array where the meminfo() function places the results. The array's size is equal to the product of the values of the info_req and addr_count parameters.
An array of size equal to the value of the addr_count parameter. The validity array contains bitwise result codes. The 0th bit of the result code evaluates the validity of the corresponding input address. Each successive bit in the result code evaluates the validity of the response to the members of the info_req array in turn.
The meminfo() function returns EFAULT when the area of memory to which the outdata or validity arrays point cannot be written to. The meminfo() function returns EFAULT when the area of memory to which the info_req or inaddr arrays point cannot be read from. The meminfo() function returns EINVAL when the value of info_count exceeds 31 or is less than 1. The meminfo() function returns EINVAL when the value of addr_count is less than zero.
Example 4-6 Use of meminfo() to Print Out Physical Pages and Page Sizes Corresponding to a Set of Virtual Addresses
void print_info(void **addrvec, int how_many) { static const int info[] = { MEMINFO_VPHYSICAL, MEMINFO_VPAGESIZE}; uint64_t * inaddr = alloca(sizeof(uint64_t) * how_many); uint64_t * outdata = alloca(sizeof(uint64_t) * how_many * 2; uint_t * validity = alloca(sizeof(uint_t) * how_many); int i; for (i = 0; i < how_many; i++) inaddr[i] = (uint64_t *)addr[i]; if (meminfo(inaddr, how_many, info, sizeof (info)/ sizeof(info[0]), outdata, validity) < 0) ... for (i = 0; i < how_many; i++) { if (validity[i] & 1 == 0) printf("address 0x%llx not part of address space\n", inaddr[i]); else if (validity[i] & 2 == 0) printf("address 0x%llx has no physical page associated with it\n", inaddr[i]); else { char buff[80]; if (validity[i] & 4 == 0) strlcpy(buff, "<Unknown>", sizeof(buff)); else snprintf(buff, sizeof(buff), "%lld", outdata[i * 2 + 1]); printf("address 0x%llx is backed by physical page 0x%llx of size %s\n", inaddr[i], outdata[i * 2], buff); } } }
The kernel assigns a thread to a locality group when the lightweight process (LWP) for that thread is created. That lgroup is called the thread's home lgroup. The kernel runs the thread on the CPUs in the thread's home lgroup and allocates memory from that lgroup whenever possible. If resources from the home lgroup are unavailable, the kernel allocates resources from other lgroups. When a thread has affinity for more than one lgroup, the operating system allocates resources from lgroups chosen in order of affinity strength. Lgroups can have one of three distinct affinity levels:
LGRP_AFF_STRONG indicates strong affinity. If this lgroup is the thread's home lgroup, the operating system avoids rehoming the thread to another lgroup if possible. Events such as dynamic reconfiguration, processor, offlining, processor binding, and processor set binding and manipulation might still result in thread rehoming.
LGRP_AFF_WEAK indicates weak affinity. If this lgroup is the thread's home lgroup, the operating system rehomes the thread if necessary for load balancing purposes.
LGRP_AFF_NONE indicates no affinity. If a thread has no affinity to any lgroup, the operating system assigns a home lgroup to the thread .
The operating system uses lgroup affinities as advice when allocating resources for a given thread. The advice is factored in with the other system constraints. Processor binding and processor sets do not change lgroup affinities, but might restrict the lgroups on which a thread can run.
The lgrp_affinity_get(3LGRP) function returns the affinity that a LWP has for a given lgroup.
#include <sys/lgrp_user.h> lgrp_affinity_t lgrp_affinity_get(idtype_t idtype, id_t id, lgrp_id_t lgrp);
The idtype and id arguments specify the LWP that the lgrp_affinity_get() function examines. If the value of idtype is P_PID, the lgrp_affinity_get() function gets the lgroup affinity for one of the LWPs in the process whose process ID matches the value of the id argument. If the value of idtype is P_LWPID, the lgrp_affinity_get() function gets the lgroup affinity for the LWP of the current process whose LWP ID matches the value of the id argument. If the value of idtype is P_MYID, the lgrp_affinity_get() function gets the lgroup affinity for the current LWP.
The lgrp_affinity_get() function returns EINVAL when the given lgroup or ID type is not valid. The lgrp_affinity_get() function returns EPERM when the effective user of the calling process is not the superuser and the ID of the calling process does not match the real or effective user ID of one of the LWPs. The lgrp_affinity_get() function returns ESRCH when a given lgroup or LWP is not found.
The lgrp_affinity_set(3LGRP) function sets the affinity that a LWP or set of LWPs have for a given lgroup.
#include <sys/lgrp_user.h> int lgrp_affinity_set(idtype_t idtype, id_t id, lgrp_id_t lgrp, lgrp_affinity_t affinity);
The idtype and id arguments specify the LWP or set of LWPs the lgrp_affinity_set() function examines. If the value of idtype is P_PID, the lgrp_affinity_set() function sets the lgroup affinity for all of the LWPs in the process whose process ID matches the value of the id argument to the affinity level specified in the affinity argument. If the value of idtype is P_LWPID, the lgrp_affinity_set() function sets the lgroup affinity for the LWP of the current process whose LWP ID matches the value of the id argument to the affinity level specified in the affinity argument. If the value of idtype is P_MYID, the lgrp_affinity_set() function sets the lgroup affinity for the current LWP or process to the affinity level specified in the affinity argument.
The lgrp_affinity_set() function returns EINVAL when the given lgroup, affinity, or ID type is not valid. The lgrp_affinity_set() function returns EPERM when the effective user of the calling process is not the superuser and the ID of the calling process does not match the real or effective user ID of one of the LWPs. The lgrp_affinity_set() function returns ESRCH when a given lgroup or LWP is not found.