Re: ADMS/6000 and NFS

Dean Roy <DEAN AT ALPHA.UWINDSOR DOT CA> wrote:

>I am currently looking at ADSM/6000 as a possible storage management
>solution for our shop.
>
>I am looking for information on how ADSM works with file systems that
>are mounted via NFS. My situation is this:
>
>        I have users who use one machine and think the files are local but
>in reality they are NFS mounted from a central NFS server. Users are
>not allowed to logon to the central NFS server. My concern is how ADSM
>handles this situation. Which machine would backups be run from - the
>NFS server or client? Does ADSM allow a user on the client machine to
>restore a file to the NFS server?

We are new ADSM users and encounter this as a problem, although a solvable -
or at least "workable-around" - one. We have got the impression that ADSM
was first and foremost designed for a collection of *individual* PCs and
workstations and that the support of clusters is still somewhat
half-hearted. In discussions with IBM we felt that clusters as they exist in
universities (we, too, are a university computer centre) are slowly getting
into focus in ADSM development but I leave it up to them to display their
plans.

For these discussions, I have prepared a paper that identifies the typical
problems one might encounter in environments like yours and ours. The
purpose of this paper explains its critical undertone. I append my paper to
this letter.

Hope this helps. Best regards,

Helmut Richter

 ============================================================================
Dr. Helmut Richter
Leibniz-Rechenzentrum     X.400:  S=Richter;OU=lrz;P=lrz-muenchen;A=d400;C=de
Barer Str. 21            RFC822:  Helmut.Richter AT lrz-muenchen DOT de
D-80333 Muenchen           Tel.:  ++49-89-2105-8785
Germany                     Fax:  ++49-89-2809460
 ============================================================================


Usability of ADSM In a Multi-User Cluster Environment
=====================================================

In various requests, the need for a better support of workstation
clusters serving many users (in contrast to many independent
workstations each serving a single user) was pointed out. A large
number of single requests may be helpful for prioritizing necessary
updates to the product; however, the overall picture of what is needed
may get lost. In this paper, I try to summarize the most important
requirements irrespective of how or when they should be implemented.

Contents:

1. Requirements for multi-user sites
2. Requirements for clusters
3. Additional requirements for AFS or DFS clusters

---
1. Requirements For Multi-User Sites
1. Requirements For Multi-User Sites
------------------------------------
In this section, the problems with using ADSM on workstations with
In this section, the problems with using ADSM on workstations with
many users are summarized. Additional problems arising only in
clusters interconnected via NFS, AFS, or DFS are postponed to later
sections.

To understand the requirements, it is important to distinguish between
different classes of ADSM users, each having different requirements:

a) Individual ADSM users

   An "individual" ADSM user is one who constitutes a "node" of his own
   with a password known to this user only. Such a node typically refers
   to the data on the user's PC or workstation. Individual users have the
   requirement to back up or archive data and retrieve which was backed
   up or archived by themselves or by any other users granting them
   explicit access rights under ADSM.

b) Workstation administrators (root users)

   As far as system files are concerned, a workstations administrator has
   the same requirements as an individual user of ADSM. In addition to
   that, he has also requirements for operations on data that belong to
   users other than himself: he must be able to make regular backups of
   the entire user space, reload files that have been destroyed (e.g. by
   disk hardware failure) and reload single files on demand of users. The
   backups produced by the administrator should be reloadable by the user
   who owns the data but not by any other user.

c) Users who do not administer the workstations they are using

   These are users that have no individual passwords for ADSM, thus,
   either their Unix or their Kerberos validation must be used to
   identify them against ADSM ("generate" password mechanism).
   Typically, these are the average users of the workstation who are not
   concerned with its administration.  These users have only a restricted
   set of requirements: they must be able to archive data and retrieve
   them and to reload data that have been backed up by the workstation
   administrator. The ability to trigger backups themselves should be an
   option at the discretion of the administrator; in case this option is
   allowed, user-initiated backups should merge with regular backups
   (i.e. they should behave as if the administrator has made a backup, in
   particular inhibit another selective backup until the next
   modification of data).

d) ADSM administrators

   Whoever administers ADSM (the entire server or subsets thereof) needs
   the ability to control access to ADSM, to define policies, and to
   monitor and control the amount of resources for each user.  In
   addition to the facilities currently supported by ADSM, there is a
   requirement to provide space management (data migration).

There are no problems involved with individual ADSM users (class (a)
above), in fact, ADSM seems to be designed first and foremost for
them. However, in multi-user support, i.e. in an environment where
most users belong to class (c) above, ADSM has the following
shortcomings:

 (1) Backups by the workstation administrator are on a directory
     hierarchy level which is not suitable for the end user. Yet, in the
     graphical user interface for restore, this level, i.e. a point above
     the user's home directory is offered for the search, thus forcing the
     user to have directories of all peer users browsed in order to select
     his own. This is not only extremely user-unfriendly as an interface
     but very time-consuming, too. For both reasons, use of the command
     line interface, clumsy as it is, is still easier. A sensible
     implementation would offer both the user's home directory and the
     user's current working directory as starting points for the search
     through the database.

 (2) { Item deleted. }

 (3) The multi-user case means that many users share one ADSM node. But
     then, no statistics whatsoever about the resources taken up by each of
     the users belonging to that node are offered to administrators, let
     alone any aid for pro-active management of resources, e.g. with quota.

 (4) The protection model is described in the manuals in a slipshod
     manner.  For example, the fact that a root user is defined as a user
     who knows the node's password (and not at all a user who is "root" on
     the system) must be found out in tedious experiments, and failure to
     do so may result in severe security flaws.


2. Requirements For Clusters
----------------------------
Today's client/server configurations typically consist of a number of
Today's client/server configurations typically consist of a number of
client workstations sharing data served to them by a file server via
NFS, AFS or DFS.  The file server in turn often consists of more than
one physical machine, especially in a DCE environment. In the sequel,
we call such a configuration a "cluster" but it should be kept in mind
that clusters are not separated from each other: the sets of
workstations where a given collection of data is accessible will
overlap. This happens already with NFS clusters but much more so (and
world-wide) in a DCE environment.

ADSM, however, is not designed to cope with a situation that many
workstations share the same data:

 (5) The notion of "node" in ADSM typically refers to one workstation.
     In a clustered environment, however, a "node" should refer to a
     collection of files spanning more than one workstation but excluding
     the local filesystems of each of the affected workstation.  Such nodes
     are not supported under ADSM.

     An obvious work-around is to use the same node name on all
     workstations belonging to one cluster; otherwise, files backed up or
     archived on one workstation in the cluster cannot be restored or
     retrieved on another workstation in the same cluster (in larger
     clusters, one would typically even have a server to which the ordinary
     user has no access). As node names are the unit of security in ADSM,
     using the same node name and password on many workstations is unsafe
     from a security standpoint. Also, overlapping regions where files are
     accessible are not properly handled by this work-around.

     The combination of scheduled backups that partially use the default
     node (i.e. the node common across all workstations in the cluster)
     and the specific node for the private files of the workstation is
     tedious to set up properly because of skimpy documentation.

 (6) In a clustered environment, files usually do not physically reside
     on the same host where the ADSM client is invoked. This renders the
     transfer of files between ADSM server and client extremely
     inefficient: If a user on Workstation "NC" archives a file to the ADSM
     server "AS", the file is transferred from the NFS server "NS" to "NC"
     (so that the ADSM client is able to access the data) and from there to
     "AS". In many network topologies, the line between "NC" and "NS" is
     the only connection between "NC" and the rest of the world, so that in
     this case the transfer goes like this:

        NS --> NC --> NS --> AS

     Upon retrieval of the file, the same detour is taken again.

Note that items (5) and (6) are closely related:

If there were a support of distributed file systems in ADSM, the file
to be archived would be identified as belonging to the cluster and not
to the individual workstation. As a result, the ADSM node name of the
cluster would be selected (solving item (5)) and transfer would be
restricted to take place between "NS" and "AS" (solving item (6)).

In other words, ADSM should be able to take the possibility of
distributed file systems into acount. This is independent of whether
NFS, AFS, or DFS is the underlying distributed file system.  However,
the data about the file system that must be kept within ADSM may be
different for the three cases.


3. Additional Requirements For AFS or DFS Clusters
--------------------------------------------------
If, in a cluster, the distributed file system is AFS or DFS, the above
If, in a cluster, the distributed file system is AFS or DFS, the above
problem areas may require other solutions. Also, the use of these file
systems may raise additional requirements. All these requirements, as
listed below, are not currently fulfilled with ADSM.

 (7) As pointed out in (4) above, ownership and access rights under
     ADSM are not precisely enough defined. As far as Unix file ownership
     is used to this end, this becomes very questionable in a DCE
     environment because there Unix file ownership is irrelevant.

     Instead of creating new access control rules for ADSM to reflect this
     scenario, the only reasonable solution to these problems is to respect
     the access control information of AFS or DFS under ADSM as well.  The
     backup of an AFS or DFS file must be readable for a user if and only
     if its original was - everything else creates security holes because
     users will be unable to overview how the incompatible access controls
     of DCE and ADSM interact. This must entail an option for the user to
     change the access control list of a file copy that is currently in the
     realm of ADSM.

 (8) The preceding item is hardly conceivable without using Kerberos
     (and the same Kerberos as used by AFS or DFS) as the authentication to
     ADSM.  Sharing passwords among many users, or keeping a separate
     password for each user is no viable security strategy. At present, if
     an intruder manages to get another user's UID but not a valid Kerberos
     ticket, he cannot read the compromised user's files but he can read
     their ADSM backups. The additional security of Kerberos is thus
     undermined.

 (9) As pointed out in (5) above, the entity an ASDM node refers to
     should be a collection of possibly distributed files and not a single
     workstation. In the AFS and DFS case, these collections have names,
     the AFS or DCE cells. Inasmuch as ADSM node names serve to distinguish
     different files with equal names, this function should be taken over
     by cell names.

(10) For AFS or DFS files, the transfer of data directly to and from
     the physical location of the file (see (6) above) is facilitated by
     the fact that these file systems provide an interface to determine
     this location. Note that the physical location of an AFS volume or DFS
     data set may change at any time without giving notice to any
     application, including ADSM.

(11) Recovery from disk failure requires more information than just
     the files and their access control lists. At present, this can only be
     achieved by backing each data up twice: once on a file basis to allow
     for individual restoration and a second time on a volume basis to
     allow for disaster recovery. The second such backup is only available
     as an unsupported additional feature from IBM.

     The correct solution is probably to separate volume data from file
     data in the backups so that volumes may be backed up and restored
     without their data. For disaster recovery, one would first restore the
     affected volumes and then fill them up with missing files.

(12) Space management requires modifications to DCE's layout of file
     data: meta-data must be kept separately from file data in order to
     allow relocation of the latter without affecting the former. This
     should typically include the option to have data stored at more than
     one physical location. Both these features have been implemented by
     Pittsburgh Supercomputer Center as "Multi-Residence AFS". A similar
     feature in DFS is most urgently needed to allow space management.

Devising clean interfaces for implementing (11) and (12) is probably
the job of DFS development more than ADSM development. All the same,
IBM should start integrating these two efforts lest their ADSM product
become unusable as customers migrate to DCE.

-- end of text --
=========================================================================