“My Cluster”…Making RAM faster with every simulation (hopefully).

I use both the words “My” and “Cluster” in quite a loose sense. Mainly because it’s not mine, and it’s not quite a cluster. It is owned by Brian Davis, one of my professors here at tech, but administrated by me, funded by grant money for a project that he got a few years ago.

Essentially the grant (and simulations) are to find a better way to organize memory. If you think about the way that memory is in the conventional sense, you have programs interacting with first the CPU cache, then to the RAM, then to the HD as a ‘last resort’. Got it. Right.

What he’s aiming to do is essentially provide another layer between the RAM and the HD (and in the future between the RAM and the CPU cache) that essentially has an algorithm to organize it, both physically (reducing copper wire latency) and programatically (in page size, as well as frequency of use). I’m not too sure on the specifics past that, and that was only one branch of what he’s actually researching, but it struck me as quite interesting!

I’m hoping at some point to get job management going properly, as well as “actual clustering” going. Right now people just ssh into the head, and have a trust setup between head and nodes, so they don’t have to login after they’ve authenticated through the head. The nodes are pretty easy to start and compile code on, so it works for all the simulations, at least for now.

Every node is running Redhat Enterprise Linux 5, and the simulations are written in C. I’ve got everything setup to kickstart whenever I need it. Take tonight for example, my nodes have been mysteriously dying, and I have a feeling that someone’s code corrupted something in the T3 module, and they just kept trying to run it on all the nodes.

Thankfully I have set it up so its nice and easy to wipe everything clean. Thank god for kickstart, this is a copy of mine :


Running through my kickstart, you see that it puts every node on the same NIS domain (btdpool.ee.mtu.edu), and uses an ftp server to grab all the necessary packages and files. No X windows, as there isn’t a need for it. Generates the swap partition depending on memory size, and fills the rest of the partition with ext3 (forget LVM!). Installs all the packages it really needs, and on the post install it does the following:

Changes the passwd binary for all nodes to yppasswd, so when people try to change their password, it propagates properly through the NIS domain.

Adds “+::::::” to /etc/passwd – this basically states that the rest of the file should be looked up in “Yellow pages” (or the NIS server, whichever you prefer :-P )

Adds “+:::” to /etc/group, and does the same exact thing as the previous one, but with groups instead of passwords.

Puts an entry in /etc/fstab to connect to the NFS share I have running for everyone’s homedrive, allowing SSH trusts to remain in place no matter how many times I wipe them :).

These things seem relatively simple…now. Trust me, it took me many many hours to figure out exactly what I needed to do to get everything working like this. Now all I have to do to “refresh” the nodes is PXE boot them, and reboot after they’re done. Since nagios only checks SSH and ICMP connectivity to all nodes, it will still be fully functional.

All in all, it has been a huge learning experience for me, getting to know a lot about RHEL administration. It’s been a lot of fun, and I’ll get into all the scripting that I did to make everything work flawlessly :).

I’ve got a lot of things happening using cron and bash scripting, but I don’t feel like getting into that right now.

Also look at the blast from the past that I found today at work!






I saw it sitting there and literally burst out laughing. There was a huge cart of old crap sitting on the left when I walked in and I spent about 20 min digging through it for nostalgic things like this :).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: