maniu@securebrain.com:~# vi main.cmd

maniu@securebrain.com:~# vi VRTS_cluster



1 VERITAS CLUSTER SOLARIS

1.1 Overview

* Conf files:

  Llt conf: /etc/llttab [should NOT need to access this]

  Network conf: /etc/gabtab

  If has: /sbin/gabconfig -c -n2 , will need to run
  /sbin/gabconfig -c -x if only one system comes up and
  both systems were down.

  Cluster conf: /etc/VRTSvcs/conf/config/main.cf

  Has exact details on what the cluster contains.

* Most executables are in: /opt/VRTSvcs/bin or /sbin

1.2 Stato delle licenze:

* /opt/VRTS/bin/vxlicrep

* Per aggiungere licenze:

  cd /opt/VRTSvcs/install

  ./licensevcs

1.3 Amministrazione via web (porta 8181)

* http://10.74.24.122:8181/vcs/index

1.4 Amministrazione con interfaccia grafica:

* hagui

1.5 Utili informazioni sul cluster:

* hastatus -summary

* haclus -display

* hares -list

* hasys -display

* hatype -list

* hagrp -list

* hagrp -display

* Per la conf HW, può essere utile lanciare anche
  prtdiag. Su Fujitsu ci sono:

  * /opt/FJSVmadm/sbin/hrdconf -l

  * /usr/platform/FJSV,GPUSK/sbin/prtdiag

1.6 Verifica LLT - Low Latency Transport

* /etc/llthosts

* /etc/llttab

* Per verificare i links attivi per LLT.

  lltstat -n (da eseguire su ogni sistema)

  Si può usare anche lltstat -nvv. Mostra i sistemi nel
  cluster e gli heartbeat di rete (i links, sono 2 in
  genere per sistemi ben configurati)

  * Per lo stato delle porte: lltstat -p

* Verifico che il modulo è caricato dal kernel:

  modinfo | grep llt

* Se devo fare l'unload del modulo dal kernel

  modunload -i llt_id

1.7 Verifica GAB - Group Membership and Atomic Broadcast

* /etc/gabtab (c'è l'heartbeat dei dischi)

* Esempio di gabtab:

  /sbin/gabdiskhb -a /dev/dsk/c2t1d2s3 -s 16 -p a

  /sbin/gabdiskhb -a /dev/dsk/c2t1d2s3 -s 144 -p h

  /sbin/gabconfig -c -n2 (il numero dopo n indica il
  quorum, ovvero i sistemi che devono essere attivi per
  formare il cluster VCS affinchè parta)
  
* Lanciare il comando /sbin/gabconfig -a
  
  Ci sono situazioni particolari:
  
  * Output vuoto: GAB non sta girando

  * Se appare jeopardy (che significa "pericolo") invece
    di solo membership, allora un link è broken

* verifica del GAB sui dischi: mostra l'heartbeat dei dischi.

  gabdiskhb -l

* Verifico che il modulo è caricato dal kernel:

  modinfo | grep gab 
  
* Se devo fare l'unload del modulo dal kernel

  modunload -i gab_id

1.8 Verifica main.cf

* Per verificare la sintassi del file
  /etc/VRTSvcs/conf/config/main.cf si usa hacf:

  # cd /etc/VRTSvcs/conf/config

  # cd /etc/VRTSvcs/conf/config

  # ./hacf -verify .

1.9 Configurazione globale dei gruppi gestiti dal cluster

* basta guardare il file
  /etc/VRTSvcs/conf/config/main.cf. Si vedono molti
  dettagli, i parametri di default per esempio, col
  comando seguente:

* /opt/VRTS/bin/hagrp -display

  Si verifica la configurazione. Sono incluse le
  dipendenze dei pacchetti. Ad esempio:

  #Group Attribute System Value

  ClusterService Administrators global

  ClusterService AutoFailOver global 1

  ClusterService AutoRestart global 1

  ClusterService AutoStart global 1

  ClusterService AutoStartIfPartial global 1

  ClusterService AutoStartList global prodsshr1 prodsshr0

  ClusterService AutoStartPolicy global Order

  ClusterService Evacuate global 1

  ClusterService ExtMonApp global

  ClusterService ExtMonArgs global

  ClusterService FailOverPolicy global Priority

  ClusterService FaultPropagation global 1

  ClusterService Frozen global 0

  ClusterService GroupOwner global

  ClusterService IntentOnline global 1

  ClusterService Load global 0

  ClusterService ManageFaults global ALL

  ClusterService ManualOps global 1

  ClusterService NumRetries global 0

  ClusterService OnlineRetryInterval global 0

  ClusterService OnlineRetryLimit global 0

  ClusterService Operators global

  ClusterService Parallel global 0

  ClusterService PreOffline global 0

  ClusterService PreOnline global 0

  ClusterService PreonlineTimeout global 300

  ClusterService Prerequisites global

  ClusterService PrintTree global 1

  ClusterService Priority global 0

  ClusterService Restart global 0

  ClusterService SourceFile global ./main.cf

  ClusterService SystemList global prodsshr1 1
  prodsshr0 2

  ClusterService SystemZones global

  ClusterService TFrozen global 0
  ClusterService TFrozen global 0

  ClusterService Tag global 

  ClusterService TriggerEvent global 1

  ClusterService TriggerResStateChange global 0

  ClusterService TypeDependencies global

  ClusterService UserIntGlobal global 0

  ClusterService UserStrGlobal global

  ClusterService AutoDisabled prodsshr0 0

  ClusterService AutoDisabled prodsshr1 0

  ClusterService Enabled prodsshr0 1

  ClusterService Enabled prodsshr1 1

  ClusterService PreOfflining prodsshr0 0

  ClusterService PreOfflining prodsshr1 0

  ClusterService PreOnlining prodsshr0 0 

  ClusterService PreOnlining prodsshr1 0

  ClusterService Probed prodsshr0 1

  ClusterService Probed prodsshr1 1

  ClusterService ProbesPending prodsshr0 0

  ClusterService ProbesPending prodsshr1 0

  ClusterService State prodsshr0 |OFFLINE|
  
  ClusterService State prodsshr1 |ONLINE|
  
  ClusterService UserIntLocal prodsshr0 0
  
  ClusterService UserIntLocal prodsshr1 0

  ClusterService UserIntLocal prodsshr1 0

  ClusterService UserStrLocal prodsshr0

  ClusterService UserStrLocal prodsshr1

  #

  beadm_sg Administrators global

  beadm_sg AutoFailOver global 1

  beadm_sg AutoRestart global 1

  beadm_sg AutoStart global 1

  beadm_sg AutoStartIfPartial global 1

  beadm_sg AutoStartList global prodsshr1 prodsshr0

  beadm_sg AutoStartPolicy global Order

  beadm_sg Evacuate global 1

  beadm_sg ExtMonApp global

  beadm_sg ExtMonArgs global

  beadm_sg FailOverPolicy global Priority

  beadm_sg FaultPropagation global 1

  beadm_sg Frozen global 0

  beadm_sg GroupOwner global

  beadm_sg IntentOnline global 1

  beadm_sg Load global 0

  beadm_sg ManageFaults global ALL

  beadm_sg ManualOps global 1
  beadm_sg ManualOps global 1

  beadm_sg NumRetries global 0
  
  beadm_sg OnlineRetryInterval global 0
  
  beadm_sg OnlineRetryLimit global 0
  
  beadm_sg Operators global
  
  beadm_sg Parallel global 0
  
  beadm_sg PreOffline global 0
  
  beadm_sg PreOnline global 0
  
  beadm_sg PreonlineTimeout global 300
  
  beadm_sg Prerequisites global
  
  beadm_sg PrintTree global 1
  
  beadm_sg Priority global 0
  
  beadm_sg Restart global 0
  
  beadm_sg SourceFile global ./main.cf
  
  beadm_sg SystemList global prodsshr1 1 prodsshr0 2
  
  beadm_sg SystemZones global
  
  beadm_sg TFrozen global 0
  
  beadm_sg Tag global
  
  beadm_sg TriggerEvent global 1
  
  beadm_sg TriggerResStateChange global 0
  
  beadm_sg TypeDependencies global
  
  beadm_sg UserIntGlobal global 0
  
  beadm_sg UserStrGlobal global
  beadm_sg AutoDisabled prodsshr0 0

  beadm_sg AutoDisabled prodsshr1 0

  beadm_sg Enabled prodsshr0 1

  beadm_sg Enabled prodsshr1 1

  beadm_sg PreOfflining prodsshr0 0

  beadm_sg PreOfflining prodsshr1 0

  beadm_sg PreOnlining prodsshr0 0

  beadm_sg PreOnlining prodsshr1 0

  beadm_sg Probed prodsshr0 1

  beadm_sg Probed prodsshr1 1

  beadm_sg ProbesPending prodsshr0 0

  beadm_sg ProbesPending prodsshr1 0

  beadm_sg State prodsshr0 |ONLINE|

  beadm_sg State prodsshr1 |OFFLINE|

  beadm_sg UserIntLocal prodsshr0 0

  beadm_sg UserIntLocal prodsshr1 0

  beadm_sg UserStrLocal prodsshr0

  beadm_sg UserStrLocal prodsshr1

  #

1.10 VERITAS Cluster Basic Administrative Operations

1.10.1 Administering Service Groups

* To start a service group and bring its resources
  online 

  # hagrp -online service_group -sys system

* To start a service group on a system (System 1) and
  bring online only the resources already online on
  another system (System 2)  

  # hagrp -online service_group -sys system

        -checkpartial other_system 

  If the service group does not have resources online
  on the other system, the service group is brought
  online on the original system and the checkpartial
  option is ignored. Note that the checkpartial option
  is used by the Preonline trigger during failover.
  When a service group configured with Preonline =1
  fails (system 1) fails over to another system (system
  2), the only resources brought online on system 1 are
  those that were previously online on system 2 prior
  to failover.
  
* To stop a service group and take its resources
  offline  

  # hagrp -offline service_group -sys system

* To stop a service group only if all resources are
  probed on the system
  
  # hagrp -offline [-ifprobed] service_group -sys
  system 

* To switch a service group from one system to another

  # hagrp -switch service_group -to system

  The -switch option is valid for failover groups only.
  A service group can be switched only if it is fully
  or partially online. 


* To freeze a service group (disable onlining,
  offlining, and failover)

  # hagrp -freeze service_group [-persistent]

  The option -persistent enables the freeze to be
  remembered when the cluster is rebooted.

* To thaw a service group (reenable onlining,
  offlining, and failover)

  # hagrp -unfreeze service_group [-persistent]

* To enable a service group

  # hagrp -enable service_group [-sys system]

  A group can be brought online only if it is enabled.

* To disable a service group

  # hagrp -disable service_group [-sys system]

  A group cannot be brought online or switched if it is
  disabled.

* To enable all resources in a service group

  # hagrp -enableresources service_group

* To disable all resources in a service group

  # hagrp -disableresources service_group

  Agents do not monitor group resources if resources
  are disabled.

* To clear faulted, non-persistent resources in a
  service group

  # hagrp -clear [service_group] -sys [system]

  Clearing a resource automatically initiates the
  online process previously blocked while waiting for 
  the resource to become clear. - If system is 
  specified, all faulted, non-persistent resources are
  cleared from that system only. - If system is not
  specified, the service group is cleared on all
  systems in the group s SystemList in which at least
  one non-persistent resource has faulted.

1.10.2 Risorse di un gruppo di risorse

* prodsshr0:{root}:/>hagrp -resources beadm_sg
  
  tws_client
  
  beadm_dg

  beadm_mip 

  beadm_mnt

  twsdm_mnt

  prodsshr_mnic

  bea_admin

  TWS_vol

  beadm_vol

  prodsshr0:{root}:/>

1.11 VERITAS: Comandi Amministrazione VCS

1.11.1 Cluster Start/Stop:

* stop VCS on all systems:

  # hastop -all

* stop VCS on bar_c and move all groups out:

  # hastop [ -local ] -sys bar_c -evacuate

* start VCS on local system:

  # hastart

1.11.2 Users:

* add gui root user:

  # haconf -makerw

  # hauser -add root

  # haconf -dump -makero

1.11.3 - Set/update VCS super user password:

* add root user:

  # haconf -makerw

  # hauser -add root

  password:...

  # haconf -dump -makero

* change root password:

  # haconf -makerw

  # hauser -update root

  password:...

  # haconf -dump -makero

1.11.4 Group:

* group start, stop:

  # hagrp -offline groupx -sys foo_c

  # hagrp -online groupx -sys foo_c

* switch a group to other system:

  # hagrp -switch groupx -to bar_c

* freeze a group:

  # hagrp -freeze groupx

* unfreeze a group:

  # hagrp -unfreeze groupx

* enable a group:

  # hagrp -enable groupx

* disable a group:

  # hagrp -disable groupx

* enable resources a group:

  # hagrp -enableresources groupx

* disable resources a group:

  # hagrp -disableresources groupx

* flush a group:

  # hagrp -flush groupx -sys bar_c

1.11.5 Node:

* freeze node:

  # hasys -freeze bar_c

* thaw node:

  # hasys -unfreeze bar_c

1.11.6 Resources:

* online a resouce:

  # hares -online IP_192_168_1_54 -sys bar_c

* offline a resouce:

  # hares -offline IP_192_168_1_54 -sys bar_c

* offline a resouce and propagte to children:

  # hares -offprop IP_192_168_1_54 -sys bar_c

* probe a resouce:

  # hares -probe IP_192_168_1_54 -sys bar_c

* clear faulted resource:
  
  # hares -clear IP_192_168_1_54 -sys bar_c

1.11.7 Agents:
  
* start agent:

  # haagent -start IP -sys bar_c
  
* stop agent:

  # haagent -stop IP -sys bar_c
  
1.11.8 Reboot a node with evacuation of all service groups:

* (groupy is running on bar_c)

* # hastop -sys bar_c -evacuate
  
* # init 6

* # hagrp -switch groupy -to bar_c
  
1.12 Starting Cluster Manager (Java Console) and
  Configuration Editor

1. After establishing a user account and setting the
  display, type the following commands to start Cluster
  Manager and Configuration Editor: 

  * # hagui 

  * # hacfed 

2. Run /opt/VRTSvcs/bin/hagui.






:q!