Query all non-subscribed RHEL7 repos at once

The old Red Hat Network was simple and easy to use. The RHN website presented a list of systems in your web browser, with counts of outstanding patches and outdated packages. You could click on a specific system name and do various things like subscribe to specific repositories (channels) etc.

The current Red Hat Network is a glittering javascript tour-de-force that multiplies the number of clicks and the amount of specialized knowledge you will need to manage your systems. You can pay extra for add-on capabilities such as the ability to select groups of systems and apply a set of operations to all of them, which is almost certainly necessary if you have a large number of systems. It’s a sad travesty of the much-maligned system it replaced.

If you’re completely entangled in the new RHN with your Red Hat Enterprise Linux 7 systems (by which I mean that you haven’t managed to exit the Red Hat ecosystem for a more cost-effective infrastructure yet) you might want to do something like figure out which of the various poorly named repos (such as -extras, -optional, and -supplementary) contains some particular package you want.

Command line to the rescue! Ignore all RHN’s useless beauty and use ugly, reliable Gnu awk. This, for example, finds the repo where the git-daemon package has been hidden away.

subscription-manager repos --list | gawk '/^Repo ID/{print "yum --showduplicates list available --disablerepo=\"*\" --enablerepo=" $3}' | bash | grep -i git-daemon

After several minutes (there’s a lot of network traffic involved) you’ll find that versions of git-daemon are in five different repos.

git19-git-daemon.x86_64 1.9.4-2.el7 rhel-server-rhscl-7-eus-rpms
git19-git-daemon.x86_64 1.9.4-3.el7 rhel-server-rhscl-7-eus-rpms
git19-git-daemon.x86_64 1.9.4-3.el7.1 rhel-server-rhscl-7-eus-rpms
git-daemon.x86_64 1.8.3.1-5.el7 rhel-7-server-optional-fastrack-rpms
git-daemon.x86_64 1.8.3.1-4.el7 rhel-7-server-optional-rpms
git-daemon.x86_64 1.8.3.1-5.el7 rhel-7-server-optional-rpms
git-daemon.x86_64 1.8.3.1-6.el7 rhel-7-server-optional-rpms
git19-git-daemon.x86_64 1.9.4-2.el7 rhel-server-rhscl-7-rpms
git19-git-daemon.x86_64 1.9.4-3.el7 rhel-server-rhscl-7-rpms
git19-git-daemon.x86_64 1.9.4-3.el7.1 rhel-server-rhscl-7-rpms
git-daemon.x86_64 1.8.3.1-5.el7 rhel-7-server-optional-beta-rpms

So, you query the Red Hat Package Manager, rpm, to find out what version of git you have.

rpm -q git
1.8.3.1-6.el7

Since 1.8.3.1-6.el7 matches the latest version of git-daemon available from the rhel-7-server-optional-rpms repository, that’s the one you need to add in order to load git-daemon.

subscription-manager repos --enable rhel-6-server-optional-rpms
yum install git-daemon
.

This process is much easier than using the Red Hat Network web gui, and requires less specialized knowledge. Which is pretty sad, considering how arcane these incantations are.

rsyslog & systemd

The ancient Berkeley syslog is a functionally impoverished logging mechanism, but the protocol is well understood and widely supported. You can use a modern version of the daemon (Ranier’s rsyslog or syslog-ng for example) and work around the shortcomings of the protocol itself.

I’ve been working with a Red Hat Enterprise Linux version 7 spin-up, and since systemd is basically a Red Hat product it should come as no surprise that RHEL7 thoroughly embeds systemd.

Here’s a section of the documentation that describes how the error logging works:

Some versions of systemd journal have problems with database corruption, which leads to the journal to return the same data endlessly in a tight loop. This results in massive message duplication inside rsyslog probably resulting in a denial-of-service when the system resources get exhausted. This can be somewhat mitigated by using proper rate-limiters, but even then there are spikes of old data which are endlessly repeated. By default, ratelimiting is activated and permits to process 20,000 messages within 10 minutes, what should be well enough for most use cases.

annoying git

I’ve been installing git on some corporate servers with the idea of converting existing CVS and ad-hoc code management systems into something reasonably fast and modern.

It’s been somewhat tedious and painful, but supposedly once I’m done the installation will be stable and maintainable. For an enterprise SCM that’s a lot more important than ease of installation, at least in theory. (I ran OpenLDAP for a decade or more, so I can appreciate the value of putting all the pain up front.)

Today’s annoyance is that the gitolite documentation and web site refer to a “hosting user” but the toolset and other web sites describing gitolite installation talk about an “admin user”. After wasting several hours with Google trying to find out exactly what the difference was, I created a new user account for the admin user and executed the commands – at which point it became immediately obvious that THOSE ARE THE SAME DAMN THING.

Curse you, gitolite. I WANTED US TO BE FRIENDS.

Foswiki dependency hell

I really wanted to run Foswiki, because it seems like most of the TWiki devs ended up there, and because my employers want to run an enterprise wiki with fine-grained access and revision control driven from a corporate directory. Since Foswiki is written in perl, and Graham Barr’s excellent perl-LDAP modules can easily handle arbitrarily complex directory integration, I figured I’d just rip out all the code that checked users and groups against the Foswiki DB and replace it with appropriate LDAP calls, then send my mods upstream to the Foswiki devs. They seem like a good crowd, they’d probably appreciate a non-caching LDAP module.

But we’re heavily federally regulated, and we can’t run unmaintainable code. The number of unpackaged dependencies I’d need to run Foswiki on Red Hat Enterprise Linux is just unsupportable. I can’t find an audited, securely maintained package of File::Copy::Recursive, for example, anywhere. And there’s quite a few more (although some are available from EPEL).

I’d love to find a wiki engine that used real LDAP, instead of just caching copies of data retrieved by LDAP in a local database.

Fix uart boot errors on M1000e blade chassis

Somebody else figured it out in 2009, and I’m late to the party.

Basically, if you are running Red Hat Enterprise Linux (or one of it’s clone siblings) on a Dell M600 blade, you’ll need to modify the default BIOS settings in a non-intuitive way or you’ll get an error on every boot-up.

IRQ handler type mismatch for IRQ 12

Call Trace:
[] setup_irq+0x1b7/0x1cf
[] serial8250_interrupt+0x0/0xfe
[] request_irq+0xb0/0xd6
[] serial8250_startup+0x43d/0x5dc
[] uart_startup+0x76/0x16c
[] uart_open+0x19e/0x427
[] tty_open+0x1e8/0x3b0
[] chrdev_open+0x14d/0x183
[] open_namei+0x2be/0x6ba
[] chrdev_open+0x0/0x183
[] __dentry_open+0xd9/0x1dc
[] do_filp_open+0x2a/0x38
[] do_sys_open+0x44/0xbe
[] tracesys+0xd5/0xdf

I knew this was a serial port issue from the second and fifth lines of the trace, but I couldn’t figure out why it was involving IRQ 12, which is normally used for SCSI cards or PS/2 mice.

A moment of extreme computer geekery

Almost certainly of no interest to anyone.. well, maybe DNS experts who have occasional need of perl. Net::DNS::RR::CNAME->set_rrsort_func is pretty incredibly obscure, though.


#!/usr/bin/perl -w -T -W
#
#  DNS zone transfer and output CNAMEs sorted by target host
#  Charlie Brooks 2014-01-08

use Net::DNS;
use Net::DNS qw(rrsort);  # why don't I get this automatically?

my @domains=qw/typinganimal.net egbt.org hell.com/;

# Use system defaults from resolv.conf to find nameservers
my $res  = Net::DNS::Resolver->new;

foreach my $namespace (@domains) {

# do a zone transfer, loading resource records into array
# axfr is standard (BIND style) not djbdns style
  my @zone = $res->axfr($namespace);

# Red Hat's perl-Net-DNS-0.59-3.el5 package doesn't seem
# to have a useable rrsort for CNAMES (it tries to do a
# "<=>" flying saucer instead of "cmp") and the examples
# in the doco for custom sort methods flat out don't work
# but I flailed around until I found a way to do it.  It's
# weirdly simple if you stumble upon the magic incantation.

# dumping the CNAMEs sorted by target requires custom sort function
  Net::DNS::RR::CNAME->set_rrsort_func ('cnamet',
             sub {($a,$b)=($Net::DNS::a,$Net::DNS::b);
                  $a->{'cname'} cmp $b->{'cname'}});

  foreach my $cname (rrsort("CNAME","cnamet",@zone)) {
    $cname->print;
  }
}
exit;

RHEL6.5 has a recent AoE driver!

Version 6.5 of Red Hat Enterprise Linux includes version 83 (released May 2013) of Sam Hopkins’ canonical ATA-over-Ethernet kernel module.

Since RHEL6.4 was shipping version 47, which was six years out of date (and somewhat incompatible with current generations of AoE SAN appliances like the Coraid ESX and VSM) this is a huge improvement.

Red Hat Enterprise Linux 6.4 installs samba4-libs by default

I’m arguing with Red Hat again… the latest downloadable DVD of RHEL6 by default installs part of samba 4, which is supposed to be an unsupported “technology preview” and not a mainline package. In what world does it make sense for your flagship product, for which you sell expensive support contracts, to depend on a chunk of code you decline to support? How is that not bad craziness?

If you try to tear it out with rpm -e you’ll get sssd dependency errors. And ghods, I hate the way RHEL6 and up basically force you to run half-baked name and authentication service caching daemons – my networks worked faster and better without caching, because we actually had a high performance LDAP infrastructure that didn’t need such Microsofty complications. But that’s another rant entirely.

ANYway, if you say OK I will upgrade to Samba 4 to avoid dependency hell, you trigger bug 984727 which Red Hat has set to CLOSED WONTFIX.

Update: Andreas Schneider of Red Hat and the Samba Team has clarified the matter. Since FreeIPA (Red Hat’s Active Directory implementation) and sssd (Red Hat’s new authentication daemon, much like PADL’s PAM and NSS modules only rawer and more oriented toward caching) both require the samba4-libs library in RHEL6, that single package is now officially supported – although version 4 of the Samba Suite is otherwise still a “technology preview”.

dkms rpm for aoe v79 working… sorta…

Sunil Gupta at Dell spotted the reason that DKMS didn’t like v79 of Coraid’s ATA-over-Ethernet driver – the module info was buggered up. Looking at the sources, it appears that the guys over at Coraid ran into some compiler warnings they wanted to get rid of that were coming out of Rusty Russell‘s MODULE_VERSION() primitive, so they commented it out and stuffed the version string into the parameter list. That doesn’t affect the normal operation of the driver module at all, and since generally the only thing that uses the output of modinfo is the Mark I Eyeball, most people (including me) didn’t even notice. But it blows DKMS right up, since the install function parses the output of modinfo to test module versions.

Sunil made a patch which worked (thanks Sunil!) but I didn’t like the way it broke Sam Harris’s versioning scheme, so I made my own. Then I noticed another bug in the module info, a spurious newline that doesn’t actually hurt DKMS, and I figured what the hell and patched that too.

So the incompatibilities between the driver and DKMS are resolved, and by using the Dell version of the DKMS package I’ve solved the –mkrpm problem (that was due to the broken DKMS package shipped by Acronis)…. but unfortunately it still doesn’t completely work.

Tomorrow I’ll figure out why I don’t get the new kernel module automagically compiled for me whenever I load a new kernel RPM. That is, after all, the whole point of this exercise. If I didn’t want fresh compiles I’d be using kmod, not DKMS.

new version of RHEL aoe initscript is up

Turns out Red Hat Enterprise Linux v6 is using udev 147, so the aoe udev rules as distributed by coraid & others are obsolete(ish). The kernel throws errors when it hits the NAME=”%k” clause, and the udev folks say “It is and always was completely superfluous. It will break kernel supplied DEVNAMEs and therefore it needs to be removed… Kernel 2.6.31 supplies the needed names if they are not the default.” The RHEL6 kernel is 2.6.32 so I have revised the rules accordingly.

(Remember, if you use zaoe you have to modify the mknode setting in the script to suit your version of Red Hat linux. Currently the default is RHEL6, although it’s mostly being used in RHEL5.)

Another, more worrisome issue is that using the Red Hat supplied kernel module causes problems with the new VSX and ESM hardware, but using the latest coraid module makes the kernel throw tons of ‘aoe: unknown ioctl 0x1’ errors. Damned if you do and if you don’t!

Red Hat just doesn’t get AOE

This post edited to correct my mistakes

ATA-over-Ethernet, the high performance low-cost SAN protocol developed by Sam Hopkins over at Coraid, never gets any love from Red Hat Enterprise Linux. The AoE kernel modules included with any given release are always egregiously out of date, and don’t even seem to be contemporary with the kernels they’ve been distributed with. If you want to use up-to-date drivers, you have to either wrap up coraid’s kernel module sources with DKMS and support a complete build chain, or create kernel-version independent modules, or else you’ll get system breakage from routine yum updates. Either way you have to muck about building RPMs so you won’t break package dependency and inventory tracking, and you have to set things up to get rid of the old Red Hat module each time you get a kernel update.

And despite the inclusion of an (elderly) aoe module in Red Hat, they don’t provide any officially blessed aoe-tools package. Coraid maintains a set of simple aoe management utilities, a remote console app for their aoe devices, and a throughput testing app, all free open source software. The basic aoe-tools package has been a part of Fedora for some time now, and lately coraid has been bundling the sources for the tools with the driver sources.

In my shop, we’ve been running more than 12 terabytes of AoE storage infrastructure on Red Hat EL 3, 4, and 5 for eight years or more now. Currently more than 150 TB in production. I had some pretty major problems with AoE on RHEL6, and openly blamed Red Hat’s out of date drivers for them, but those problems have been resolved. We had a bad VLAN trunk, basically, so it was really our fault (my sincere apologies to those I mistakenly accused).

You know, it’s always seemed to me that ATA-over-Ethernet should be a natural win for Red Hat. The simplicity of the protocol, when compared to Red Hat’s approved iSCSI, reminds me strongly of the value proposition that linux represented ten years ago when compared to proprietary unixes. You can build an AoE SAN that’s faster and more reliable than an iSCSI SAN for considerably less money; why would anyone purposely choose the less cost-effective solution? That’s why so many IT shops left HP-UX, SunOS, and MVS for Red Hat linux – because linux delivered the capabilities we needed for less cash.

Strangely, though, Red Hat treats AoE like an unloved and ugly stepchild, at best neglecting it entirely. Whenever a Fedora release is repackaged for publication as a Red Hat Enterprise Linux release, the aoe-tools package is removed; bleeding-edge Fedora probably provides a more stable AoE SAN platform than the flagship product. It’s deeply weird behavior and counterproductive for Red Hat.

I wrote an initscript for AoE under Red Hat that works under RHEL 3 through 6. It’s in the software section.

There’s an interesting discussion that references this blog post here.

udev under RHEL6 on a Dell m1000e blade server

The udev subsystem dynamically responds to udev events generated by the kernel and creates device nodes on the fly. This solves the old *nix problem of too many files in /dev that don’t have anything to do with the hardware you actually have. If you see a file named /dev/mouse, the system really truly has a mouse attached to it. Or at least that’s the plan. The kernel calls udev whenever you activate or deactivate any device, not just USB and PCMCIA cards, and this lets linux support absolutely any kind of hot-swappable hardware, even processors.

This is not the first dynamic hardware mapping scheme to hit linux – I think Red Hat EL4 used devfs and there was at least one other one before that. And people have tried to do it using the HAL daemon too.

The udev daemon reads “rules” that tell it how you want your devices named. This is so that different distributions that have historically used different names can all use udev, they just have their own rule sets. This also means that hardware invented tomorrow is trivial to add – just make a new rule.

In Red Hat Enterprise Linux v6, udev rules are read from multiple places in order to be maximally confusing and infuriating.

/lib/udev/rules.d is a “library” of standard rules that udev uses by default.

/etc/udev/rules.d is a place where you can put your own system-specific rules that will override the library. This is based on the names of the files, not the rules themselves.

/dev/.udev/rules.d is a secret hidden directory that seems to exist mostly to make you angry. I haven’t found any clear documentation of precedence of rule files found here. You have to experiment on a test system. The rules in here are usually referred to as “temporary rules” which may very well mean “permanent overrides” in normal English.

udev automatically detects changes to rules files, so changes take effect immediately without requiring udev to be restarted. However, the rules are not re-triggered automatically on already existing devices, so you might have learn to use the udevadm command (or just reboot) if you write any new rules.

But wait! There’s more! It was starting to make sense, and we can’t have that!

If you have a Dell system, a program called biosdevname will be invoked by rules in the library file /lib/udev/rules.d/71-biosdevname.rules whenever the system becomes aware that a network interface exists. The Dell white papers on the subject tell me that this program will rename your network devices and disk drives in a way that’s both counter-intuitive and inaccurately documented, but in reality it just renames the ethernet interfaces on a specific set of Dell systems.

Now, honestly, on some models of Dell server this behaviour makes a decent amount of sense; it labels the embedded ethernet ports on the system’s motherboard as “em1” and “em2” which is a good idea since Dell labeled the ports 1 & 2 (instead of the expected 0 and 1 – Real Computer Scientists can’t count past nine on their fingers). Then it will label any additional interfaces by the PCI bus positions, which is sort of uselessly informative but at least consistent.

On a Dell M1000e blade server, though, running biosdevname is worse than doing nothing. Instead of following their reasonably sane idea of making the labels on the chassis match the labels on the interfaces, the emn and pxpy naming scheme is mapped pseudo-randomly across the network ports. Net fabric A is arbitrarily designated as “embedded” (remember, on the M1000e none of the interfaces are embedded in the blades themselves), P3 is considered fabric B, and P1 is assigned to fabric C – even the order doesn’t match. So, on blade five, the interface names end up looking like this:

Blade Chassis Name -> Linux Interface Name
A1-port5 -> em1
A2-port5 -> em2
B1-port5 -> p3p1
B2-port5 -> p3p2
C1-port5 -> p1p1
C2-port5 -> p1p2

This is like purposeful obfuscation of location, and much worse than just nailing down eth0 through eth6 to specific MAC addresses (which RHEL6 would happily do if Dell’s biosdevname software wasn’t sticking its member into the pie).

The best solution I’ve found so far is to create a file named /etc/udev/rules.d/71-biosdevname.rules (thus overriding /lib/udev/rules.d/71-biosdevname.rules) and put a series of udev rules in it that associate specific names with specific MAC addresses. I map the actual Dell chassis labels like so:

Blade Chassis Name -> Linux Interface Name
A1-port5 -> ethA1p5
A2-port5 -> ethA2p5
B1-port5 -> ethB1p5
B2-port5 -> ethB2p5
C1-port5 -> aoeC1p5
C2-port5 -> aoeC2p5

With this setup, you will know which wire the telco guy unplugged as soon the system starts screaming about something gone wrong on ethA1p5 – it’s the Cat 6 ethernet wire in slot A1 port 5, obviously. You can also see which interfaces are intended to be used for our high-speed ATA-over-Ethernet SAN infrastructure.

That’s enough for today…