As outsourced CTO’s we often get called on to interview System Administrators on behalf of our clients. The following is a series of questions & potential answers you may find helpful if you are planning to interview someone or in turn be interviewed for such a position.
Keep in mind that many of these questions do not have an exact answer – so to help you we’ve also tried to explain the methodology & rationale behind the types of answers you should be looking for.
How do I deploy a server for a single web application?
You need to deploy a single server for a client’s web application. Assume that you have the ability to spin up a server from scratch – walk us through what you do.
Most answers are fine so long as they lead to a stable and secure environment that is ready for code deployment. Bonus points for taking into consideration optimizing Apache and MySQL for the web application. If they bring up my.cnf – ask for specifics (e.g. query_cache_size, join_buffer, if a high memory box and app uses innoDB – then using frighteningly large innoDB settings comes into play for performance.).
Warning Signs: They do not mention security considerations at all.
Same question, only now you have to set up a small cluster.
They should be asking questions of you at this point. Not terrible if they do not .. but better/confident SysAdmins will ask for more specifics or mention assumptions. However, that being said they may just forge forward and tackle this from a general type of example mindset since this is an interview.
Create the web servers. Create the DB server. Place a load balancer in between them. Determine method for syncing data between servers. Ideal condition is a SAN device attached locally. Less ideal is an rSync.
Ask them how they would scale up a single server to a cluster and what considerations need to be made?
Answer – we’re looking for too general of answers in order to uncover whether they have experience working in tandem with developers for these types of changes. E.g. “Change web apps to use an internal IP for database server.”
What Loadbalancer method would you start out using and why?
Answers – if they have not set up a load balancer.. you will get a blank stare. The idea is to uncover the difference between knowing what something is and having actually worked with it.
E.g. Round Robin, least Connections, least sessions, predictive, historical, etc ..
What is PII? & why is it important.
Answer: PII = Personally Identifiable Information. It is name, email, social security number, address or other information indicating the identity of a user/person
The importance is that there are legal obligations for companies, site owners, clients to protect this information. Now, the above by itself doesn’t require encryption – however the above in conjunction with a payment method (Checking/Banking/Credit Card Numbers). Since we work with client’s systems & often they do not fully understand the requirements they should be adhering to in terms pf protected information – it is important that everyone understand the significant and be able to recognize PII, HIPAA and other classes of information when they see it.
WA State defines PII here: http://apps.leg.wa.gov/RCW/default.aspx?cite=19.255.010
(5) For purposes of this section, “personal information” means an individual’s first name or first initial and last name in combination with any one or more of the following data elements, when either the name or the data elements are not encrypted:
(a) Social security number;
(b) Driver’s license number or Washington identification card number; or
(c) Account number or credit or debit card number, in combination with any required security code, access code, or password that would permit access to an individual’s financial account
WA State defines financial account data here: http://apps.leg.wa.gov/RCW/default.aspx?cite=19.255.020
(a) “Account information” means: (i) The full, unencrypted magnetic stripe of a credit card or debit card; (ii) the full, unencrypted account information contained on an identification device as defined under RCW 19.300.010; or (iii) the unencrypted primary account number on a credit card or debit card or identification device, plus any of the following if not encrypted: Cardholder name, expiration date, or service code.
System Administrators are equally responsible for ensuring that the security of the server-side is paid as much attention as the application layer.
Investigating Gaps in Server Logs
You are the SysAdmin for a heavily trafficked ecommerce website. During a routine inspection of a web server log you notice a 5-minute gap in the Apache access log. Why is this significant? How do you investigate?
Either the logging daemon stopped (unlikely, easy to determine) ; the web server was down or a loadbalancer stopped directing traffic to it. More seriously, someone edited the log.
Absent the above reasons – you treat this as a potential security breach.
You SSH into a server under your control and you see the following, what is this, what is it used for and where is it controlled?
This example is a MOTD Banner Message. Messages can be defined either before or after login.
There are two way to display messages one is using the issue.net file and second one is using the MOTD file.
issue.net : Display a banner message before the password login prompt.
motd : Display a banner message after the user has logged in.
These are used to present legal messages, welcome messages or help identify servers. E.g. multiple command line prompts open can get confusing.
If you are not expecting a message or it has changed – you treat this as a potential security breach.
Tell us how you have screwed something up in a production environment and what you did to fix it.
Anything except the typical BS “Here’s a screw-up that was actually not a screw up… “ answers.
Ask about what they did that afterwards to prevent something like that from happening again.
Name two package management utilities you’ve used and explain how you have used them.
YUM & APT – frontend interfaces. (Centos/RedHat/Debian)
Pkgtool & Slackpkg for Slackware & PACMan for Arch Linux. Emerge for Gentoo.
Ask for specifics .. to weed out “I’ve heard about this..” versus..” I’ve actually used this and here’s how.”
Users are reporting that the website seems slow. You confirm with a visual inspection. How do you diagnose this further?
SSH into the server or login to control panel. Run “top” to see processes & check memory and CPU usage. Check Apache or NGINIX processes for easy to find culprits. Run “netstat” to find excessive connections from single IP or ranges. Check MySQL processes for table locks or excessive waits. Determine if any database tables are crashed or corrupted.
More Experienced Answer:
Assuming the above as a baseline as to whether the server is under load and the likely culprit, you would want to determine whether the slowness is actually caused by the server, or is experienced primarily on the client-side. Since you visually confirmed that the site seems slow, you are operating off the assumption that this holds true for all current visitors. To separate out client side from server side you will want to run a speed test and look at the load times for all external calls to assets (scripts, objects, CDN, etc..). Certain sites may be constructed in a way (poorly) where a long wait on an asset causes the site to not fully load. This is most often seen with tracking scripts and ad-server calls.
Specific Command questions:
Describe what the command “PS” does & when would you use it?
PS = Process status. It shows currently running processes and their PID. It can additionally show process for a particular user, all users or for a particular hierarchy (or tree).
Why is PID important?
Answer = you must know the PID of a process in order to run commands against it, e.g. “kill”.
Describe what the command “AWK” does & when would you use it?
AWK is basically a command-line filter that is most often used to reformat the output of other commands.
How would you use AWK in conjunction with PS in a real-life scenario?
Answer: When you need to narrow and return processes with a certain string or column of information in them.
ps aux | grep tomcat – would return procs with “tomcat” in the owner, process or path.
Describe how rm –rf can ruin your day?
Answer – you just recursively deleted everything starting from the directory this command was run in. This is a legit command and there are plenty of reasons to use it.
It’s your first week on the job and you’ve been asked to remove a previous SysAdmin’s user account. How do you do this?
Since you do not know how the user removal process is configured, you better have checked the /etc/deluser.conf – file to make sure that it has not been configured to remove the /home/ directory and/or act recursively upon execution.
Other Command Q’s:
How does signal to noise ratio figure into server administration and network security?
Answer – wasting time chasing false positives – ability to narrow down focus to problems. Difference between 50 quick logins failures using random names from a botnet versus 4 slow failures every 31 minutes using a real username or a specific user .. indicating that they have some data on your environment.
How would you use tailwatch?
To watch the end of log files so you do not have to constantly download them to view.
Give me some examples on how you use “grep”.
(Looking for a string of text in a file, certain files, locations or the entire server. Grep’in the entire server is a bad answer – this command will take forever and possibly put the server under load. It’s a lazy way to do this.
They are dozens and dozens of questions we ask in a typical interview session. The above helps you figure out rather quickly whether the person you are interviewing has experience and good instincts versus just knowing some commands.