I am a little confused here. Are you saying that when I pause an ip stream on the H3 for up to an hour, that the data is not written to the hard drive? It would seem that a lot of temporary memory would be required to buffer an hour's worth of data.
My apologies if my simplified description left some confusion.
Before I try to answer your question, let me give you a brief overview of MPEG-4 video compression. The MPEG-4 standard is huge with only a part of it dealing with video compression, so I will only describe the compression part. But, even a simplified description is a bit long, so bear with me.
The most common usage is H.264 compression. Usually the lossy part of the standard is used that throws out bits of the video that are redundant or aspects that the human eye can’t see or won’t miss. A video frame is divided into a variable a number of rectangular regions. Each region may be compressed differently depending on the content of the video it contains. The compression can vary from very little to 80% - 90% or more in size. This gives the video stream its size reduction. In addition, each region will contain reference snapshots of the data in the region. The reference snapshots are used to compare the video change in the region over time. If nothing changes in the region, the reference snapshot is used to reconstruct the video frame. If there are changes in the region, only the changed data is include in the region data steam and the rest of the region reconstruction comes from the previous reference snapshot. If the changes are extensive enough, a whole new reference snapshot is created. This gives further data size reduction and temporal compression. All of this work is done by the encoder. And the resulting data stream is a constant varying list of varying sized region changes and reference snapshots. The resulting data stream is construct so that it is a far simpler process to decompress than to compress. Additionally, time codes are buried in MPEG-4 data stream to correlate the varying compressed video with linear time.
To make things like trick play (fast foreword/reverse, single frame stepping, etc.) workable and useable, a method has to be used quickly access the correct point within the varying compressed video data stream base on linear time. Many methods have been devised to do this that and have spawned many patents and patents fights. But in general, all of these methods build some kind of indexed lookup database to lookup where to go to in the varying video data stream based in a desired linear time index.
Now to your question. You correctly assume that there is no way that a whole hour of even compressed video could be held in memory. Instead, the common method is to use a three stage buffer scheme. The first stage is to keep a couple of fully decompressed frames in memory surrounding the current display point. This allows things like frame-by-frame single stepping with no perceivable time delay. This is the same video data that is sent to the HW frame buffer for display. The second stage is to buffer some chunk of compressed video stream data in memory for quick access depending the direction the video is being displayed (forward or backwards) using the time indexing method. Getting the video stream data in large chunks reduces the number hard drive disk accesses and the load on the managing CPU. This stage comes into play for fast forward/reverse or skip forward or reverse. The last stage is just to use in the time index method to go directly to the hard drive for larger time spans within the varying video data stream. Typically when a video is first started or returning to a previously stopped video for replay. (And again think many patents and patent fights)
As you can imagine, it is a complex process to coordinate all of the buffering so that things don’t get out of sync. And remember, there is a similar process going on for the audio that I didn’t talk about that has to synced with the video.
Why is there all of this complexity? It’s to reduce the SW/HW resources to accomplish the task, and to reduce the cost of the HW and CPU horsepower needed. Otherwise products would be prohibitively expensive and unaffordable. All of this leads to a long list compromises and trade offs when a product is first design and specked. And because not every feature can be foreseen or anticipated, those compromises and trade offs often lead to limitations of what new features and functionality can be squeezed in to the product in the future. This is why we get newer-better products versions instead of endlessly updated products.
As I said in the original post, things are way more complicated under the covers than it seems looking from the outside. And now think about messing with that delicate complexity it add a new feature. I’m sure you can see why some things somethings just take a long time to develop.