Aware of the RabbitEars Outage

Well, the new gear arrived kind of late in the day today so the backups are just now finishing. I'm testing "restoring" the database backup on my local computer and it claims it's going to take 15 hours. Now while my local computer here is not as speedy as the server, if it's going to take a similarly long period of time, I'm going to try to do that piece of it during an overnight, implying the disk replacement really should happen in the afternoon so long as the disks are still working.

(Ideally, if the disk replacement happened at 3PM ET, hopefully I could start restoring the database by about 5PM, which if it took 12 hours would mean it would be restored by 5AM, not too long before I typically wake up. Then I could continue my work from there.)

Separately, I decided to also have the power supply replaced in case the problem is actually a power supply issue. I'd had more than one person suggest it as a cause for the issues and had also thought it a possibility myself, and decided that as long as I'm having it taken offline for hardware replacement anyway, might as well have that done as well.

Given these two factors, and the fact I now have a family matter that's going to keep me heavily occupied on Tuesday and Wednesday, I'm tentatively planning to have just the power supply replaced tomorrow morning and see if that helps with the server's stability in the immediate term.

If it does, I'll very likely leave it be until at least Thursday (and I'll probably turn the Signal Search Map back on again) when the dust on the family issue settles a bit. If the power supply somehow completely fixes it, I'll decide whether or not to go through the disk replacement exercise at this time at all, but I'm not expecting to be that lucky. I'm just hoping it's better than it is now so I can push it back.

If it does not help at all, then I'll probably try to have the disks pulled tomorrow afternoon and do the database restore overnight on Monday night, implying the site would go down Monday afternoon and not come back until at least Tuesday morning. If I don't finish by the time I have to deal with the family issue starting in the late morning hours, then the restoration might drag out, which is why I'm hesitant to rush it.

Sorry this is dragging out so much. I want to get this resolved as quickly as everyone else does.

- Trip
 
  • Like
Reactions: FTA4PA and Mr Tony
Take no chances. Spend the time and money (already spent?) ONCE and swap ‘em all!
 
Take no chances. Spend the time and money (already spent?) ONCE and swap ‘em all!

I'd do it eventually because I want the extra disk space and I want to reload the system in a different way to redo the configuration for easier remote management. (For those who know, I set it up with BIOS RAID-1 when I have come to the conclusion that I really should have done software RAID-1 within Linux. Lesson learned.) The question is whether I wait to be there in person, probably at least a month or two away, or have Silica Broadband do it for me remotely this week.

Anyway, the server now has a new power supply and a new UPS. The UPS should prevent the various power outage type issues we've been having occasionally, while I'm now hoping the power supply stabilizes things for at least a few days.

EDIT: I've reenabled the Signal Search Map for now. I'm trying to run the server as if it were running normally in order to see if it's stabilized. I don't want to create a false sense of security.

- Trip
 
I've made the decision to push the disk replacement to at least Thursday. Got personal life things going on Tuesday and Wednesday and I'd rather not be scrambling. It went five hours without a disk error before throwing a few more. It looks like it's unhappy with a single sector, but I'm guessing it's more than that given the behaviors of the past few months.

I'll post here again when I firm up the schedule.

- Trip
 
Well. Suffice it to say the server is not as stable as it appeared. It was throwing far fewer disk errors, but suddenly went kaput about an hour ago. Disk went read-only again, so I can still get in and look at the data, at least.

I've already grabbed this morning's database backups and a handful of other changed files from the past week. I'm doing a backup of the Live Bandscan files before I try to do a quick repair. If I can't get it up again in short order, I guess I'll have the disks done tomorrow. I'm out of town until next weekend and was really hoping not to have to do this while away from home.

I was planning to be in bed 45 minutes ago. Of course it went down right before bed on a tiring day.

- Trip
 
Forgot to update this morning. So suffice it to say I did get it back up and running. I'm still hoping the server can continue to limp along in its current state until I get home next weekend. I'm out of town supporting a surgery recovery and it'd be a lot better to be in my normal environment at home when I need to restore from the backups.

- Trip
 
To update on this situation, per Scott's advice, I've been setting up a copy of RabbitEars on the same host that Scott now uses for SatelliteGuys. It's good in multiple ways; it will allow the site to stay up while the server is being reloaded, and also gives me a chance to actually try doing the setup and see what problems I run into. It's moving along, and when I'm comfortable with it, I'll cut over to it.

In the process, I've ended up doing some maintenance and tidying of the database. I moved some large datasets that I use for personal projects into a separate MySQL database to make the backups and restores smaller. I'm also currently culling old Signal Search Map results, as results from as far back as 2019 are apparently fully half of the now-shrunken database's size. That's why the site's sluggish and under heavy load today.

Amusingly, I designed it to only store results that had been viewed going back a certain amount of time, but that code never worked. So now I'm having to go back and do it all at once and it's crawling along. My goal, if I don't get impatient and just stop part-way through, is to delete every Signal Search Map study that hasn't been viewed in the past 6 or 12 months (haven't decided which yet), but I will also keep studies of any age that are associated with a user account.

- Trip
 
Much appreciated, thanks. :)
Glad to.

It's still deleting old rows. (I've been doing it in chunks, to allow the database update process and backups and things to happen in a timely fashion.) As of this moment, it's deleted everything not tied to a user name that had not been viewed since before October 17, 2022. Those are studies with keys older than 786000 and last viewed timestamps older than 2022-10-17.

Separately, in the middle of this fiasco, my laptop's hard drive has started to die as well. So I'm scrambling to make a current backup of that as well so I don't lose anything on it, and have a new SSD on order to replace it, hopefully today. Seriously, when it rains, it pours.

- Trip
 
Well, the server went down on me pretty hard this evening, and I've moved to the temp server. Note that the temp server does not have the Live Bandscan set up on it. Going to try to get it up and running again tomorrow long enough to make sure I have the full backup and then try to have Silica Broadband do the hardware replacement for me, which will require reloading the server.

- Trip
 
The best news on the situation now is that I have a full backup. Nothing has been lost, and the temp server just finished getting updated with today's LMS data now that it's become available for the first time since last week.

The good news is that the VPS is holding up pretty well. I just reenabled the Signal Search Map and nothing seems to have collapsed, so that's good.

The good-to-neutral news is that message it gave me when it went down yesterday was new, and reading up on it, while it can be a disk issue, it's also possible it could be a RAM issue. As such, I'm going to try to have Silica Broadband do MemTest on it to see what happens. If it fails, I'll have new RAM shipped out and see if that fixes it so I can avoid the remote reload. If it passes, then we'll proceed to the disk replacement and reload.

The neutral-to-bad news is that there was a code injection attack on RabbitEars which caused some people to see weird stuff like strange ads. I've fixed it, in multiple ways, but I'm still getting reports. If you're still seeing it, please let me know. Press Ctrl+F5 to reload from scratch and clear the cache; there's not much I can do to force your cache to clear beyond what I've already done, which involved renaming the impacted files and then changing the pointers to them in the code.

The bad news is the Live Bandscan will remain down for the duration. While I'm impressed at how well the VPS is holding up given everything I'm throwing at it, I don't want to play games with it by trying to get that running and then have it bog down to the point of uselessness. I miss it already, but better safe than sorry at this point.

=====

I don't expect this to be resolved today, as I expect the MemTest won't happen until later in the afternoon, and I'm going to let it run for a few hours at least, possibly overnight, unless it throws up errors really quickly. It's also possible that it's both the RAM and the disks, but if the RAM tests bad, I won't be able to verify that until after the RAM is replaced, an fsck is done to fix the outstanding disk issues, and I put load back on the server.

And, of course, if all those things don't fix it, then it's the board. At that point, all bets are off and I'm effectively building a new machine, but then I have a bunch of expensive hardware sitting many hours away that I should really be using for my rebuild. Not sure what I'd do in the immediate term given that eventuality. Let's hope it doesn't come to that.

- Trip
 
The best news on the situation now is that I have a full backup. Nothing has been lost, and the temp server just finished getting updated with today's LMS data now that it's become available for the first time since last week.

The good news is that the VPS is holding up pretty well. I just reenabled the Signal Search Map and nothing seems to have collapsed, so that's good.

The good-to-neutral news is that message it gave me when it went down yesterday was new, and reading up on it, while it can be a disk issue, it's also possible it could be a RAM issue. As such, I'm going to try to have Silica Broadband do MemTest on it to see what happens. If it fails, I'll have new RAM shipped out and see if that fixes it so I can avoid the remote reload. If it passes, then we'll proceed to the disk replacement and reload.

The neutral-to-bad news is that there was a code injection attack on RabbitEars which caused some people to see weird stuff like strange ads. I've fixed it, in multiple ways, but I'm still getting reports. If you're still seeing it, please let me know. Press Ctrl+F5 to reload from scratch and clear the cache; there's not much I can do to force your cache to clear beyond what I've already done, which involved renaming the impacted files and then changing the pointers to them in the code.

The bad news is the Live Bandscan will remain down for the duration. While I'm impressed at how well the VPS is holding up given everything I'm throwing at it, I don't want to play games with it by trying to get that running and then have it bog down to the point of uselessness. I miss it already, but better safe than sorry at this point.

=====

I don't expect this to be resolved today, as I expect the MemTest won't happen until later in the afternoon, and I'm going to let it run for a few hours at least, possibly overnight, unless it throws up errors really quickly. It's also possible that it's both the RAM and the disks, but if the RAM tests bad, I won't be able to verify that until after the RAM is replaced, an fsck is done to fix the outstanding disk issues, and I put load back on the server.

And, of course, if all those things don't fix it, then it's the board. At that point, all bets are off and I'm effectively building a new machine, but then I have a bunch of expensive hardware sitting many hours away that I should really be using for my rebuild. Not sure what I'd do in the immediate term given that eventuality. Let's hope it doesn't come to that.

- Trip

It's still progress. Hang in there and thanks!
 
  • Like
Reactions: joblo and Trip
The bad news is the Live Bandscan will remain down for the duration.
So what are we talking about in terms of "duration"? I have a couple of new HDHRs I've been waiting to add in new location(s), and I don't want to put any pressure on you, because I know this whole effort is a volunteer gig for you, but just as a ballpark estimate, when do you think the Live Bandscan will be back and able to accept new devices?

In the meantime, thanks very much for all your efforts with RE generally. Can't say enough how much I, and I'm sure many others, appreciate it!
 
  • Like
Reactions: Trip
So what are we talking about in terms of "duration"? I have a couple of new HDHRs I've been waiting to add in new location(s), and I don't want to put any pressure on you, because I know this whole effort is a volunteer gig for you, but just as a ballpark estimate, when do you think the Live Bandscan will be back and able to accept new devices?

In the meantime, thanks very much for all your efforts with RE generally. Can't say enough how much I, and I'm sure many others, appreciate it!

I'm not sure. I was hoping to have the MemTest going by now but I know my contact at Silica Broadband was out and about today and I wouldn't blame him if he just went home after his lengthy day in the field. I pay nothing and depend on his generosity in both hosting and time, so since I'm not completely offline and have my full backup, that's okay with me.

I've been talking to Russ about testing the Live Bandscan on the VPS, just in case this drags out. I copied all the files over this afternoon so everything I need to do is done to make that happen. That could happen as soon as tomorrow, but we will see. (Obviously, if we set it up and the whole thing screeches to a halt, we'll turn it off again.)

- Trip
 
I'm not going to start the setup just yet; I forgot to have him change a BIOS setting so the operating system has to be reloaded again. D'oh!

But it went quickly and smoothly otherwise, so once that's resolved, I'll get everything loaded back up and then at some point, maybe over the weekend or maybe next week depending how fast I am, I'll cut back over to the primary server and hopefully put this mess behind us.

Separately, the Live Bandscan is now running on the temp server. It clearly bogs it down more than the primary server, but nothing screeched to a halt and it's well within reason, so I'll call it a win.

- Trip
 
  • Like
Reactions: joblo and FTA4PA
My bandscan tuner seems to have been lost in the shuffle. I can see it scanning but I am no longer in the list of sites. Lebanon, TN
 
Last edited:
Top