Scheduling: Managing Alfred Servers and User Access


Intro

As dispatchers process their jobs they frequently need to assign remote servers and other limited resources to the commands they are launching. It is the maitre-d's role to arbitrate among requests from dispatchers on the network which are competing for these resources.

The maitre-d process is a version of alfred which is running in a special background service daemon mode. It makes its decisions based on the information defined in the current master schedule. The schedule data can be defined, and modified, using the interfaces illustrated below.

See the Operational Details of the Maitre-d discussion for more information on the functions that the maitre-d performs during job processing.

 

Making Remote Servers Available to Alfred

You will be able to make changes to the global scheduling data if you have write-permission on the underlying schedule file ($RATTREE/lib/alfred/alfred.schedule, on the maitre-d host). Permission to edit this file will be indicated by the Write Schedule button, if this button is disabled then you do not have the appropriate permissions.

To add a server to the Alfred schedule, simply select the hostname from the Network Hosts lists and press the Include button. This will generate an Alfred server entry with the default attributes, which should be appropriate for use with the other RenderMan Artist Tools.  

Note: for a server to be useful, the appropriate server software must be running on the named host. For remote rendering this usually means installing, and starting, alfserver.

Server are defined by several attributes in the Alfred schedule:

 

Saving the New Server Definitions

Press the "Write Schedule" button to save the new schedule data to disk an update the maitre-d. The old schedule file will be archived by the maitre-d into the $RATTREE/lib/alfred/oldSchedules directory.

If the small "modified..." caption is showing it means that you have changed some of the server definitions during your editing session.

 

Filtering the Host List

There are two methods for controlling the number of host names which appear in the "Network Hosts" list:

 

Temporarily Offline Servers

The right-mouse button pops up a menu over the selected servers which allows you to temporarily mark them as offline or unavailable. The same menu is used to re-enable them.

This is sometimes useful when a machine is down for maintenance, or should otherwise be avoided by all Alfred dispatchers.

Enabled servers are marked with a dark blue diamond.
Disabled servers are indicated with an outlined diamond.

Note: Temporarily disabling a server in this manner is much cleaner than going into the (advanced) Crew definitions and turning it off in all of the server groups that reference it. It also makes it easy to bring the server back online again later.

 

 

 


Advanced Scheduling Topics


 

Group Hierarchies

The resources in the schedule are organized as a hierarchy of groups which define, in successively more detail: where, when, and for whom remote services are available.

These structured groups are particularly useful at sites which need to organize several, possibly overlapping, groups of hosts and users. For example, it can be useful to allow rendering at night on machines that are used for another purpose during the day, or to assign work for specific projects to particular servers.

 


Viewing and editing the master schedule

From the main alfred job queue window select "master schedule..." from the Scheduling menu. This is the top-level interface to the schedule.

NOTE: the "write schedule" button will only be enabled if the user has write-access to the master schedule file.

 

Crews: Users + Servers + Times

An alfred crew simply defines a single combination of remote servers, users who have access to them, and the times they are available. Each defined crew has a name, which is arbitrary. Each crew (a one-line entry in the dialog shown above) specifies the users, servers, and times via names of groups of these things, which are defined in other dialogs.

Important: Individual crews can be toggled on and off by clicking on the crew name. This enables or disables an entire crew definition. Only the enabled entries (with the indented, highlighted checkbox indicators) will be considered during schedule queries.

To modify a crew definition click on the lock icon to unlock the editable items. The crew name can be changed just by clicking on the current name and typing a new one. The user, server, and time group associated with a crew can be changed by selecting a new one from the pull down menu for that group in the crew entry.

The "Pri" field defines a scheduling priority for each crew. When servers are scarce this weighting factor is used to determine which of several competing users (dispatchers) gets the next available server. Higher numbers give the most weight; in this example the "production" crew has the highest priority, which means that everyone listed in the "produce" user group will have highest priority on servers in the "renderers" group.

Group definitions can overlap, so an individual user may be listed in several user groups for example. In these cases, a dispatcher's access is defined to be the union of all the crews in which it is a member; its priority is the highest of those available.

To add a new crew definition (using existing user, server, and time groups), select "add new crew" from the Crews menu at the top. The default settings are those of the most recently edited entry.

 

To define a new user group definition, or to see the current definition of an existing group, select "view/edit" from the User Group menu at the top of the column. This brings up a group definition dialog. Each currently defined group is listed in the left column, click on the group name to highlight its members in the right column. Note: if you click on the names in the right column you will change the members of the currently selected group!

To create a new group, or add new users, select the appropriate entry from the pull-down menus at the top of the dialog. New userids are entered individually or brought in via /etc/group name. As a special case, '*' is the predefined name of a user group, and it matches everyone's login. (More explicitly, when the maitre_d comes across a system-time-user binding which uses '*' as the user group it substitutes "the current user.")  


Server groups

The Server Group "view/edit" menu brings up the remote server definition dialog. Again, there are two components to the dialog, the group names on the far left, and the individual server definitions on the right. Clicking on a group name ("testing" in this example) selects all the servers which are part of that group, shown with amber, indented checkbutton indicators. Clicking on the lock button opens a given entry for editing. The group names (e.g. testing) should probably indicate something about the common server type or function of their constituent services. For example some sites might want to group their hosts by machine type, while others might want to group them according to the function they serve, or possibly their physical location.

Briefly, remote servers are selected by Service Name using the Selection Keys as a search criteria (and depending on other schedule permissions); when a server is bound to a command running on a particular dispatcher it is "checked out" and unavailable to other dispatchers until the command completes and it is checked back in to the maitre_d.

The Service Names column defines the name by which alfred will refer to a remote server. These are arbitrary (one word) names, which allows project-specific or service-specific naming conventions to be used instead of actual host names. Also it is possible to have several parallel services defined on the same physical machine, since a service is defined to be a particular application (i.e. server daemon) running on a particular host. For example, the machine name "janus" is a two processor system running a alfserver which has been configured to support two netrendering slots; these are expressed as two identical service entries, with different service names (Janus1 and Janus2).  

The Selection Keys field for each service is a list of arbitrary (blank delimited) words which the maitre_d uses to match a service request to a particular host. In an alfred script each Cmd needing a remote server specifies a simple keyword search expression which is compared against this list. For example, scripts generated by MTOR expect that systems running a copy of alfserver will have the pixarNRM key defined. These keys also provide a mechanism for describing a single server host which has several mutually exclusive services available on it. For example, a single-processor server may have a netrender slot enabled and also support remote MTOR RIB generation, but it would be inefficient to do both of these things simultaneously. By defining a single service for that host, with keys that match both types of server requests, it is possible to ensure that the maitre_d only hands out the host for one kind of work at a time.

The UDI field is the Universal Desirability Index associated with each service. This integer value provides a simple a ranking mechanism. When a request for a server of a particular type arrives, the maitre_d searches its list of available servers in a specific order until it gets a match. Candidate slots are pre-sorted by their UDI value, from numerically highest to numerically lowest. Hence, those server slots with the highest UDI value are considered to be the most desirable machines, and they will be handed out first.

The actual desirability values used are arbitrary, and sites can use any scheme they want for defining them. Often, the CPU speed and available memory are important factors when determining which machines are the most desirable. It can also be useful to rank user's desktop machines lower than any systems which are intended to be servers only, so that the user machines get used only when the renderfarm is completely in use.

Note: a common configuration mistake is to define a new service, but to forget to make it part of one of the server groups. Or sometimes a group is properly defined but no crews reference it.

 


Time groups

Time groups are just blocks of hours through a one-week period. The striped box indicates the current time. In edit mode, new time periods can be defined. Times are selected by dragging out rectangular regions of either enabled times (left mouse) or disabled times (middle mouse). Remember, just as with the other group editors, changing the the current time block diagram changes the definition of the currently selected group (weekdays in this case). If you want to experiment, first create a new group using the "Time Groups" menu.


Limits and Tags

See the Limits document.


Server Ping Commands

See the Pings document.


NIMBY

Sometimes when the master schedule lists user's desktop machines as potential remote servers, the affected users can become quarrelsome. In fact, their interactive performance usually suffers when a dispatcher launches a remote rendering on their machine. These users can run "alfred -nimby" which places a small window on their desktop and opens a connection to the maitre_d. The NIMBY process (for Not In My Back Yard) temporarily blocks the maitre_d from binding their host as remote server, (the dispatcher will show a "shields raised" message in its watch servers dialog). The NIMBY process will allow dispatching to its host when the user has been idle for a predetermined period (the screensaver idle interval). When the user returns to their screen they can use the NIMBY user-interface to eject the running command from their machine (it will be restarted on another server). There's also an auto-eject mode that interrupts any running alfred jobs automatically when the mouse is moved or there is keyboard activity.

Note that the alfred scheduling scheme tries to err on the side of making servers available whenever possible, which is why users must do something active (i.e. start alfred -nimby) to keep work from being dispatched to their systems. Obviously there are other ways to keep a system from getting jobs, for example the dispatching user can deselect the server in their huntgroup dialog, or the server can be disabled or removed from the master schedule. And as with many things, some of the hardest resource allocation problems are more social than technical, and are therefore (thankfully) not really alfred's to address.

A note about screensavers: if a desktop system is being used as remote rendering/compute server, users should be careful to have low-overhead screensavers enabled, "Blank" being the simplest and lowest overhead. Some of the more exciting screensavers consume enough CPU cycles to make remote computing pointless.

 

Pixar Animation Studios
(510) 752-3000 (voice)   (510) 752-3151 (fax)
Copyright © 1996- Pixar. All rights reserved.
RenderMan® is a registered trademark of Pixar.