Considering the Source

Fixing up a Netgear R7000P Router

Troubleshooting and installing DD-WRT

From: Wed 28 July 2021

Category: Tech

Tags: linux | repair | arm | golang | networking

I was given this broken Netgear router, which is a R7000P with 5 gigabit ports and 2 Broadcom BCM4360 wireless chips. It supports all kinds of fancy shit I don't understand like MU-MIMO, which sounds like an adorable robot, and beamforming, which sounds like a futuristic type of art. It also sports a USB3.0 port, which means real storage. There's a lot of potential in this machine, so I'm going to try to figure out the problem and install newer open-source firmware. This device would be really useful as a client bridge for wired machines and as a test bench for network measurements.

After some research, I decided on DD-WRT as the new firmware. My first guess was OpenWRT, but they don't allow nonfree drivers for Broadcom chips. I respect this position, but with Broadcom chips the first party closed-source driver has much better performance than the open-source alternatives. In this case it makes sense to taint the kernel to take advantage of the chipset's full performance. DD-WRT has some sort of an agreement with Broadcom where they can package the "wl" drivers, so it's an easy installation.

Troubleshooting

Before doing anything else, I did a hardware reset on the router by holding the "reset" button for 30s right after powering on. This is an ARMv7 based router, so it doesn't require the "30/30/30" reset like older hardware. This brought up the router on the default IP address and SSID settings, but after a few minutes it stopped responding on 192.168.1.1 and required a couple reboots before it was responsive. Even though it kept crashing after a few minutes, I had web UI access so I downloaded a new copy of the stock firmware from Netgear's website and flashed it to the board. The new firmware copy seemed to clear up the crashes, I suspect there was some corruption of the stock firmware but I didn't think to pull an image of it before flashing for analysis.

Solar flares strike again!

Getting DD-WRT

$ ncftp ftp://ftp.dd-wrt.com/betas/

Connect via FTP to the betas site and download a release from the 'r7000P' directory. I used the 07-22-2021-r47086 release, as it was mentioned in one thread to be working well on the R7000P.

Flashing DD-WRT

This was super simple, I just used the web console of the built-in firmware to flash the factory-to-dd-wrt.chk file. After a few reboots, the machine came up with the default DD-WRT firmware. If your router is bricked or otherwise stuck without the web UI, you can use the excellent nmrpflash utility to flash it as the router boots.

Unfortunately, the stock DD-WRT flash didn't come up on the default IP address or broadcast a network. To debug, I need to connect to the internal TTL serial header.

Connecting a USB TTL serial console

With the LEDs at the bottom of the board, the left-to-right order of attaching the cables is tx, rx, then gnd. In my case that's white, black, then brown wires.

This is different than the R7000, which is unfortunately pictured on the OpenWRT wiki in the R7000P page. After connecting to the console, the debugging picked up in earnest.

Setup and the Client Bridge

There was a long loop of these messages in dmesg:

dhdpcie_readshared: address (0xf8ee0711) of pciedev_shared invalid
Waited 5006383 usec, dongle is not ready
dhd_bus_init :Shared area read failed
dhd_bus_start, dhd_bus_init failed -1
dhdpcie_init: dhd_bud_start() failed
dhd_detach(): thread:dhd_watchdog_thread:2fc terminated OK
dhd_flow_rings not initialized!
dhdpcie_pci_probe: PCIe Enumeration failed
dhdpcie_init: can't find adapter info for this chip
DHD: dongle ram size is set to 1835008(orig 1835008) at 0x200000
dhd:0: fw path:(null) nv path:(null)
_ctf_attach:attach dhd
dhd_attach(): thread:dhd_watchdog_thread:2fd started
dhd_deferred_work_init: work queue initialized
dhd:0: fw path:(null) nv path:(null)
dhd_bus_download_firmware: firmware path=, nvram path=
dhdpcie_ramsize_adj: Enter
dhdpcie_ramsize_adj: Adjust dongle RAMSIZE to 0x240000
dhdpcie_download_code_array: Downloaded image is corrupted (4366c0-roml/pcie-ag-splitrx-fdap-mbss-mfp-wnm-osen-wl11k-wl11u-txbf-pktctx-amsdutx-ampduretry-chkd2hdma-proptxstatus-txpwr-authrmf-11nprop-obss-dbwsw-ringer-dmaindex16-bgdfs-stamon-hostpmac-murx-hostmemucode-splitassoc-dyn160, 2021.07.19.032415, 2021/07/19 03:24:15).
dhdpcie_bus_write_vars: Downloaded NVRAM image is corrupted.

After researching this for a while, I decided to erase the NVRAM contents (wiping out any user-defined settings). This was done via the root console:

# erase nvram
# reboot

While this brought the router up with default settings, it didn't affect the boot messages about the NVRAM image corruption. From what I gather, this machine may have damage to its NVRAM chip. Clearing it again, reprogramming it, and restoring backups of the settings had no effect on this message at boot. Thankfully, it seems informative-- the NVRAM settings are being saved properly as I update them and persist between reboots. Shruggie emoji.

These two commands are essential for bailing yourself out of a bad configuration via serial console:

# nvram show | grep lan_
# nvram set lan_ipaddress="10.0.0.1"

The first allows you to view current settings (usually defined through the web UI) and the second allows you to modify them. If you're like me and "Apply" settings from the web UI too early, this lets you figure out what other settings need modification so you don't have to hard reset the router to fix things.

The "client bridged" article works great with my device, but it only uses one of the two chips. The speed I'm getting through iperf3 measurements is capped at about 130 Mbit/s. It may be possible to connect both the chips independently, but this will have to be for research another time.

Installing Entware and opkg

Running other software on this device is a lot easier with a proper package manager. Entware is a simple package manager for embedded devices that keeps all its data in /opt and is commonly used on routers. The installation is quite simple, once you get a USB device set up. It seems that DD-WRT does not support automounting ext4 volumes, I went with the suggestion to use ext2 and it worked fine from there on out. Since we're in a more limited environment, use blkid /dev/sdX1 to get the UUID of the partition to automount rather than looking through /dev/disk/by-uuid/. Follow this guide to enable https for the package mirrors, which is necessary due to Entware's lack of signing.

Using node_exporter

I have a Prometheus environment that uses node_exporter on most of my systems to collect system-level statistics from each machine. The router is an ARMv7 machine, so it should be able to run node_exporter natively as it's written in Golang. I could cross-compile it, but where's the fun in that- I want to build the package on the router itself. Let's start off with an opkg install go_nohf. There's also a "go" package, but as this issue describes it's only for machines with hardware floating point. Downloading and installing this package took some time, most of which seemed to be in extracting it to the USB drive.

To build node_exporter:

# go get -v github.com/prometheus/node_exporter

This of course completely pegs both cores of the CPU and makes the machine unresponsive for the duration of the build. While it got through building quite a few packages, this seems to consistently crash soon after building google.golang.org/protobuf/encoding/prototext. There are no OOM killer messages in dmesg but this seems to be an OOM killer situation-- the memory usage of the build process was getting close to the machine's 256M limit. Ah well. Let's try cross-compiling from an x86 machine:

$ cd go/src/github.com/prometheus/node_exporter
$ GOARCH=arm GOARM=7 go build -o node_exporter_arm .
$ scp node_exporter_arm R7000P:/opt/

Unfortunately this doesn't seem to work:

# /opt/node_exporter_arm
Illegal instruction

Fun times. I know Golang needs floating point support to run and without tracing the binary further my guess is that VFP support is disabled- which is correct:

# grep Features /proc/cpuinfo | uniq
Features    : half fastmult edsp tls

There's no FPU in the BCM4709A0, and it looks like I'd have to recompile the DD-WRT kernel to enable virtual floating point. There's another option too; based on this issue, GOARCH=5 will compile with soft floating point emulation support as ARMv5 boards have no FPU. This is a good enough workaround for now, but if there's performance issues with this exporter I'll try to enable VFP in the kernel.

$ GOARCH=arm GOARM=5 go build -o node_exporter_arm .

The node_exporter binary now works as expected. Performance-wise, a simple load test shows about 15QPS which is more than enough for the use-case.

$ wrk http://R7000P:9100/metrics
Running 10s test @ http://R7000P:9100/metrics
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   572.47ms  138.66ms   1.01s    66.67%
    Req/Sec     9.85      5.03    20.00     74.19%
  168 requests in 10.02s, 6.93MB read
Requests/sec:     16.77
Transfer/sec:    707.85KB

While most important metrics are being collected by node_exporter, it fails on a number of different collectors:

$ curl -s http://R7000P:9100/metrics | awk '/node_scrape_collector_success/ && $NF == 0' | cut -d\" -f2
bonding
fibrechannel
hwmon
infiniband
ipvs
mdadm
meminfo
netstat
nfs
nfsd
nvme
powersupplyclass
pressure
rapl
schedstat
softnet
tapestats
zfs

Conclusion

This Netgear R7000P works as a wireless client bridge, running Linux and exporting statistics about its performance and network throughput.