Things typically Admins need to take care. This checklist is especially helpful when we join a new company.
- Infrastructure overview:
- Total servers (VM) – Windows, Linux server count
- Total physical servers
- DRAC, iLO details – login each servers to check the login credentials working or not
- Server (VM) naming conventions
- Account creation – naming conventions
- Patching tool details, Monitoring tool details
- Patching compliance details
- Domain and domain controller details, Check Replication status
- Domain controller – AD backup, enable AD recycle bin
- File share details, backup details
- VM backup details – Retention period details
- Vsphere version
- Anti-virus on VMs.
- Anti-virus manager console login credentials. Check AV with out-dated virus definitions
- VMs without VMware tools. VMtools installation and upgrade on VMs
- Product / Tool owners
- Server owners
- Cluster details (IPs, Witness server details)
- Any outdated OS and needs to be upgraded ? Need for outdated OS
- KB location – Sharepoint, Confluence etc
- Procedure to do maintenance activities, reboots, patch installation etc.
- IP range, subnets etc for VMs
- If VPN, VIP manager console login credentials
- Ticket: SLA and OLA
- What are the things we take care and what are the things we don’t ?
- License list – contact details
- Application and hardware support contact details
- Things that are not supported by our team.
- Creating tickets and following up on behalf of other teams ?
- Who takes care of patching, License renewals, Hardware renewals, Application upgrade, OS upgrade, Server decommission (VM and hardware).
- What is the frequency to send notification emails to teams when we perform a maintenance activity ? Ex: Sending email 3 / 4 days before, 1 day before and 30 minutes before maintenance
- Monthly report – disk space report generation (using PowerCli etc), Hardware check for any issues, DC replication status etc.
Below is a JD for Technical Lead, Tech Mahindra I have got recently. They are looking for 10+ exp and should be able to handle escalation from L2 and L3 guys.
Job Summary/ TASKS:
· Office hours 8 x 5(10:30AM to 07:30PM IST) + Out Office Hour support in case required and will compensated with Comp-off.
· Work closely with Onsite counterparts to test new solutions in POC, Design and/or propose solutions.
· Manage Projects and deliver projects on time
· Follow defined Incident, Change and Problem Management processes.
· Critical Incident & Problem Resolution support
· All the technological/technical risks are mitigated/escalated/communicated in a timely manner.
· To act a mentor for rest of the team on all the technical aspects of the systems in scope and guide them to the best practices and right architecture standards.
· To work closely with Manager to provide inputs concerning assignment of technical activities (Project as well as operational) based on the skill setup and capability of resources.
· To provide timely feedback on the knowledge levels of the resources in the teams and to coach/mentor them to upscale their skill levels.
· Shift & resource planning
· Quality compliance and enhancements
· Service Improvement plans and initiatives
· Service Level reporting analysis & compliance
· Troubleshoot every escalated case from L2/L3 Teams and check the opportunity to restore the service or give an acceptable workaround in order to meet the immediate business needs
· Read knowledgebase articles or vendor documentation to be up to date on the current technologies used within the infrastructure. Should be able to write documents on standards or procedures and train teams
· Use ticket tracking software to record all work activity, timesheet etc, following defined processes based on ITIL methodology
· Communicate via phone and/or Email, follow up with global teams in Europe, America, Asia & Japan and resolve incidents
· Analyze every proposal from L3 and approve/reject
· If the incident or ticket cannot be solved immediately, follow escalation procedure and assign the tickets to the appropriate teams and coordinate for resolution
· Handle calls from Server and application support teams and provide accurate information for quick restoration of service
Required IT Skills
ź Server Hardware (Blade, KVM, Blade Enclosure..)
ź Resource Management, Disks, CPU and Memory etc
ź RAID and Array Controller Concepts
ź Firmware\Driver Management
ź Installation & Configuration
ź In depth Windows Server Administration & Troubleshooting
ź In depth knowledge in Performance Management & Tuning
ź Security Management (Policies, Audit, Logs)
ź NTFS & Access Management
ź DNS/WINS & DHCP Management
ź Printer & Patch Management
ź Upgrade & Migration of Servers
ź Capacity and Availability Management
ź In depth Diagnostics & Analysis (Dump Analysis…) skills
ź Windows Resource Kit & Support Tools
ź DFS & DFSR Concepts. Hands-on experience required.
ź Remote Management Tools (MSTSC / RDP Client / ICA, DSView 4, HP ILO and PsTools)
ź In Depth Knowledge of Windows(2008/2012) Cluster Services (Software, Hardware, Network Requirements, Shared disk, etc)
ź Installation and Setup of Cluster Services
ź Configuring the Quorum Disk, Shared Disk, Heartbeat, etc…
ź Managing the Cluster Resources (Failover, Failback, etc)
ź Good Troubleshooting skills on Microsoft Clustering
ź Upgrade & Migration of Cluster
Server Virtualization (VMWare)
ź In depth knowledge in VMWare vSphere 5.x/6.x
ź In depth knowledge in Virtual Infrastructure Management (VC, ESXi Host & VI Client)
ź HA & DRS
ź VM Backup Technologies(E.g.: Snap Manager for Virtual Infrastructure from NetApp)
ź Knowledge in VMWare vRealise
ź Knowledge in VMWare VSAN
Scripting & Automation / WMI
ź Batch Files
ź Powershell Scripts
ź Azure Automation – Good to Have (Basic skillset)
SAN & Storage Concepts
ź HP EVA, NetApp (or any Enterprise storage solution)SAN Infrastructure
ź Fabric Switch Management, Zoning
ź Storage Provisioning
ź DR Management
ź Network Load Balancing & Teaming
ź TCPIP (v4 & v6) Concpets
ź vLANs / DMZs & networking concepts
Backup & Recovery
ź Backup & Restore Management with any Enterprise Backup solution like HP Data Protector and Commvault (Sympana)
ź Overall concepts must be clear.
Packaging & Remote Deployment Tools
ź Microsoft Installer/AD Deployment
ź Patch Management (SCCM or WSUS)
Active Directory (2008 R2/2012 R2)
ź Active Directory concepts and best practices
ź Users & Groups Management
ź Sites & Services Management
ź Group Policy Management
ź Security Policies
Microsoft Operations Management / SCOM 2012
ź SCOM Management (Operators Console)
ź Management Pack Configuration & Development
HP Systems Insight Manager
ź HP SIM Console
ź Version Control Agent & Upgrade
ź Citrix Metaframe / XenApp
ź Citrix Management Console
ź Publishing Applications (incl. Wise Packaging, Load Balancing…)
ź Performance Management
ź Apache & tomcat
ź SQL Server Administration
ź Oracle Server Administration
ź McAfee ePO Management
· Must have a good understanding and worked on SLA/OLA’s and Under pining Contracts for providing infrastructure Service’s.
· ITIL/PMP/Six-Sigma best practices.