I am recovering videos from a broken drive and I literally have nothing but raw data to work with. I can already recognize most containers by the first few bytes. AVI starts with RIFF, MP4 starts with ftypmp* and so on. What I need is a way to tell when a file ENDS. WMV does the work for me because the filesize is in the header and so does AVI but it doesn't matter because the obvious patterns at the end of an AVI gives it away.
Mostly I need help identifying the correct length of MP4s.
Try StreamFab Downloader and download from Netflix, Amazon, Youtube! Or Try DVDFab and copy Blu-rays! or rip iTunes movies!
+ Reply to Thread
Results 1 to 20 of 20
Thread
-
-
For mp4, the mdat + moov atoms is almost the complete file. Look at some good files in a hex editor and you'll see.
I'm not aware of a place where the absolute number is stored. It might not be. Perhaps someone else will chime in -
To know how an MP4 container can be structured (similar to MOV and 3GPP), will require to study the "ISO Media" container specifications. Several chunks are mandatory, others are optional, but they all will have a hierarchy, they cannot be placed in any arbitrary order. Chunks usually have a size field, so you can skip forward this number of bytes to find the next chunk (maybe with some exception of "list" style chunks), until you reach the end of the file.
-
Atoms? What are you talking about? "Almost the complete file" doesn't cut it either. I need to know exactly when it ends. It's obvious with AVI but not with MP4 and others.
Ligh.de, many of these videos are hours long so I will not have time to inspect every individual chunk of every video to say the very least. I need a way to quickly tell where videos begin and end and so far it's only possible with AVI and WMV. Even this is taking forever where I can only recover a few in one hour. -
Try to understand the purpose of the following tool:
http://atomicparsley.sourceforge.net/
Then we talk again. -
The size of a file is in the directory entry for the file. Otherwise, for the MP4 container, you have to parse the atom tree to get the size. There is no overall file size entry in the MP4 headers that I know of. Each atom starts with a 4 byte size element (big endian) followed by the four character atom name and then data for that atom.
Last edited by jagabo; 8th Sep 2018 at 12:01.
-
Aludin,
Have you not tried the standard file recovery tools such as Getdataback or ZAR (Zero Assumption Recovery) ? I have had success with both of these. -
Ah, I get it now. MP4 is more 'raw' than others so info like size has to be inferred by putting the chunks together. The word atom confused me. Basically, I can overestimate the data I'm selecting with the hex editor and this program will have no problem knowing when it ends. This is doable. Thank you.
At first I thought I would have to manually add up the data of every single chunk. Derp!
jagabo, the MFT is lost, that's the problem. Directory indexes are still scattered on the drive but without knowing where on the disk each entry points to, I have no way to know which file is which.
DB83, I have tried recovery tools though not the ones you mentioned. They are incredibly unreliable so I decided I had no choice but to do it manually. This drive is largely unfragmented yet most of what these programs autorecovered was pieces. Do you know of one which is more interactive? I'd like to see exactly the data it has selected and approve each one. -
ZAR will preview what it is going to recover. Since it scans the entire disk or partition (selectable) this process will take some time - many hours for a large partition.
I believe the free version only scans and previews. You would have to purchase the program to do a full recovery or the files you select. But as I said I recovered and entire partition with the full program so more than happy with it. -
Guys, I've figured out what the headers of most formats look like but I need some assistance with the TS format. There doesn't seem to be a fixed pattern. MPEG files are also weird. I noticed most of them start with 00 00 01 Bx but they have this header multiple times in the same video.
Also, is there a way to know the size of an MKV file? -
Transport streams are designed to be picked up midstream (for example, when you turn on your TV you don't have to start watching a show from the beginning) so headers are repeated every half second or so. Transport streams can also change resolution (for example ads are often at a different resolution than the show).
-
-
Have you not tried the standard file recovery tools such as Getdataback or ZAR (Zero Assumption Recovery) ? I have had success with both of these.
As for detecting the end of a file, the usual behaviour for such recovery softwares, for file types which do not have a field relative to the expected complete size in their header, is to attribute all subsequent sectors to a given file until the header of the next file is found. But the main problem of raw file recovery is fragmentation : on a drive with regular activity, where lots of files have been written and deleted then written again, most files (especially large video files) will be fragmented, i.e. the end of a file is not located in the continuity of the begining, there can be a gap, small or very large, the end can even be located before the begining, there can be 2-3 chunks or hundreds, it's totally unpredictable, and there is no regular software that I'm aware of which can efficiently reconstruct fragmented files. Photorec purports to do just that, but in my experience it very rarely gets even one fragmented file complete. Fragmentation can be tremendous in the case of files downloaded simultaneously, even when there's plenty of available space : I've had a situation once where I had recovered almost the whole contents of a failing 3TB HDD, except 6 video files, but several of those were downloaded at the same time and thus massively fragmented, with as much as 8000-12000 little chunks each, I made a mistake and had to use a very special trick to rebuild them (in a nutshell : I knew thanks to a read scan with HD Sentinel and a few other tricks that those particular files contained bad sectors so I put them in a separate folder while I was recovering everyting else, hoping to not damage the drive further by insisting repeatedly on damaged areas, and it turned out to be a wise move indeed, because when I did try to get those files the state of the drive deteriorated quickly ; then, as I was limited in storage space for the recovery, I chose to image only the area where those files were located, with a safety margin before and after, plus the first 10GB, as I thought that it would contain all the metadata needed to extract those files, but that proved to be unwise, as the MFT itself was fragmented and had extents near the middle and near the end of the drive, it was stupid because I already knew of a method – with ddrescue and a complementary tool called ddru_ntfsbitmap – to specifically recover the very important MFT before attempting to recover the data area, but you're bound to do stupid things when your own data is in jeopardy if you don't take some time to think it through and devise a thorough plan of action before attempting anything... ; so I managed to recover most of those files' sectors, but I couldn't extract the actual files, with R-Studio or WinHex, as I didn't have the complete MFT – as I found out, those tools can't rely solely on the volume snapshots / scan information backups that they generated to keep track of files' allocation informations, they need to interact with the actual MFT – and by that point the drive was too damaged to recover the missing chunks ; but luckily, I had copied the complete list of sectors allocated to each of those files, obtained with three different tools – HD Sentinel, Recuva, nfi.exe – and with that I managed to create ddrescue scripts to rebuild the actual files by copying the relevant sectors from the image in an automated way... otherwise it would have been waaay too time-consuming to do that manually !).
I am recovering videos from a broken drive and I literally have nothing but raw data to work with.
Both R-Studio and Photorec can open a volume image as well as a physical drive (with Photorec either launch it via the command line and specify the path to the image, or – on Windows – right-click on the image, select “open with...”, then locate photorec.exe).
Also, is there a way to know the size of an MKV file?Last edited by abolibibelot; 28th Oct 2018 at 14:59.
-
Wow, had no idea of your reply abolibibelot. I must have email notifications turned off. I'll read up on it all tomorrow. For the future, separate your paragraphs so it's easier to read.kthx
But just so everyone knows, I've already recovered everything by hand so this case is pretty much closed. It took forever and I never wanna do anything like this again. My left hand is ******* burning. I learned a lot in the process and can easily see how this can be automated but in the end I didn't wanna pay for that program that DB83 recommended so I did it manually. The beginnings and ends of all these files (except MKV and WMV) are so obvious.
For the record, TS files always begin with bytes 47 40 and the header is followed by a bunch of repeating bytes so it's hard to miss. The only problem is that the header repeats in the same video much like MPEG so it's hard to tell if it's one video or multiples. -
^^ You only get a notification for the first post after your latest reply so you should get a notification about this reply. That is until you visit the forum again.
As for doing file recovery manually I take my hat (if I wore one) off to you. But as you said the drive is not fragmented. To attempt this on a fragmented one would, in my estimation, be an even more thankless task.
And sometimes to pay even a small sum is worthwhile. Unless you have tons of time on your hands -
Hell of a story, abolibibelot. Backing up the MFT is essential indeed, if you can't back up the whole drive that is. After this incident, I have backed up the MFTs of all my important drives.
You say photorec and all these other recovery tools only look for headers of a file and not endings, but they should. Look at the ending of any AVI file and observe the pattern. It's 01wb or whatever and then 4 bytes of steadily increasing values and then 00dc or another 01wb. MP4 has a similar pattern. WMV does too but it's not obvious because it's only like 1000 bytes, not 10% of the file. MKV sometimes has some readable text of metadata at the end but it's not consistent.
The developers of these recovery tools need to collaborate with the designers of these media containers.
That means that all the file system data is completely gone ? How did you get that “raw data”, perhaps by imaging or cloning the defective drive with something like ddrescue ?
Are there lots of bad sectors at the begining ? (Usually the MFT, Master File Table, which contains all the metadata and file allocation information, is located within the first few gigabytes.)
It's good to know MKV does store size info but the address should always be fixed like WMV is.
DB83, the drive actually is fragmented but not much. About 10% of the files were, 1/4 of them horribly.
I see now that this process can be automated to a large degree but not as accurate as human intervention. I only have to notice patterns in the bytes to see that it's obviously the beginning or end of a video format but this isn't as easy to teach a program. I mean, searching for hex "4740" on the whole drive returned millions of results. If it's really a .TS, it would start with 4740 and then have a bunch of patterns but they aren't fixed enough to relegate to a simple keyword that a program can look for.
I don't regret doing it manually. No program would've come close to this level of precision. Recovering fragmented files is a mindfück however... I have a few ideas on how this can be automated but it would still require frequent human intervention and in the end I doubt it's worth the effort since preventative measures like backups are literally a matter of clicking a button and waiting 60 minutes to avoid this bullshit altogether. -
I would disagree.
Like I said, the programs I used, ZAR in particular, fully recovered a non-accessable drive/partition. Of course there must be info such as the TOC for these to work but they do not read the drive sector by sector. They read the most recent sector and then read the next sector of the actual file. I hardly think that DOS's have changed so much that the next active sector is not referenced at the end of the recent one. To argue that they need educated is unfair in the extreme. Some might but not all.
I have no connection with ZAR. I recc it because I used it. Remember what ZAR stands for ? Zero Assumption Recovery. So it hardly needs educating.
You might well have the advantage over many having some advanced knowledge how disk systems work. But maybe you should put that knowledge to work and devise your own software. Only then might you appreciate just how difficult it is to create a good program for this. -
Backing up the MFT is essential indeed, if you can't back up the whole drive that is.
Hum, that's gonna be quite a mouthful to read again, sorry...
After this incident, I have backed up the MFTs of all my important drives.
There was a bad sector right at the beginning but it did not corrupt the MFT. That occurred when I ran checkdisk. Never happened before but chkdsk completely overwrote the MFT with an empty one.
I had once a 3TB HDD full of TV recordings being identified as 746GB when plugged to a portable computer running on Windows Vista through a USB enclosure, with a warning to check it for errors, which actually runs CHKDSK under the hood : without thinking twice, I accepted... it did find many errors and attempted to “fix” them... next thing I know, hundreds of files ended up corrupted, with a chunk of file A being now identified as a part of file B, a WMV file now appearing as a completely unrelated MP4 file, interrupted in the middle by a TXT file, and so on... Luckily I had a complete backup (missing only a few unimportant or temporary files), but I still had a hard time trying to figure out what happened. In this case there was no physical failure, no bad sector, yet CHKDSK made a huge mess with no “undo” option. Crazy that they didn't think it through when they designed that tool, and that it is still operating with no safety net on recent Windows systems, with more than 20 years of development and user feedback.
You say photorec and all these other recovery tools only look for headers of a file and not endings, but they should. Look at the ending of any AVI file and observe the pattern. It's 01wb or whatever and then 4 bytes of steadily increasing values and then 00dc or another 01wb. MP4 has a similar pattern. WMV does too but it's not obvious because it's only like 1000 bytes, not 10% of the file. MKV sometimes has some readable text of metadata at the end but it's not consistent.
The developers of these recovery tools need to collaborate with the designers of these media containers.
Regarding MKV files, I signaled on the dedicated R-Studio forum how that otherwise excellent software used to have poor results for raw MKV detection, which may have contributed to its vast improvement on that particular aspect in later versions. From what I could see, there are at least three types of MKV headers, depending on where the “matroska” string is located, which is indeed surprising, as usually a file header should have a constant and instantly recognizable pattern.
@DB83
Like I said, the programs I used, ZAR in particular, fully recovered a non-accessable drive/partition. Of course there must be info such as the TOC for these to work but they do not read the drive sector by sector. They read the most recent sector and then read the next sector of the actual file. I hardly think that DOS's have changed so much that the next active sector is not referenced at the end of the recent one. To argue that they need educated is unfair in the extreme. Some might but not all.
You might well have the advantage over many having some advanced knowledge how disk systems work.
But maybe you should put that knowledge to work and devise your own software. Only then might you appreciate just how difficult it is to create a good program for this. -
@abo, you have good taste in software tools. I like to preserve timestamps as well. I fücking idolize my childhood and keep it well preserved. Who needs a goddamn diary when you have a trillion zeroes and ones?
I used the same seagate except it was a 2TB and it failed me almost exactly 5 years after I bought it. At the very first sign of trouble (multiple sectors being reallocated all of a sudden and rapidly rising) I unplugged the fücker and cloned it to the new 2016 Seagate drive of the same capacity with Winhex. Trust me, Winhex is better than any tool specifically made for cloning disks. I have no idea why and neither did Winhex's own authors until recently where they finally made a product exclusively for that purpose.
It's only useful if the contents are static...
CHKDSK has never failed me for the past 20 years until now so I didn't know any better. It said there was only an error in a few files so my impulse was to let it fix it rather than let this tiny error take up half my day of backing up 250 GB worth of stuff. I know better now. I've come out of so many worse crashes that I didn't think this one could top it.
One time I lost my partition tables for my HDD (which contained backups including for my damaged system drive) and had to recover 8 lost partitions, took me 16 hours to recover the first 7 and the last one was the worst because it was exFAT which no recovery tools recognized. It only had one file in it (a volume file) which was like 200GB and it was fragmented into 4000 pieces. NEVER again did I use exFAT after that.
I'm completely stumped as to how CHKDSK could fück up so bad but I learned my lesson and will never use that again either, sure as hell not without backing up at LEAST the MFT if not the entire damn volume.
I was gonna say I'll only exclude it in the instance of hardware problems but after you told me it screwed you over too when your drive was perfectly healthy, it shows me the obvious. CHKDSK is an obsolete slice of shit that doesn't belong in this millennium.
It seems to me that he said just that... that it must be very tricky, considering how complicated a single file type can be, and the intricacies of each file system, and so on... -
I wrote a reply but lost it. Basically, what I wrote was that I wasn't slamming anyone/thing, just saying it how it is and that the actual developers of the AVI, MP4, WMV and MKV formats should be consulting the authors of data recovery software, not an unqualified enthusiast like myself who doesn't know anything beyond what the headers of those formats 'look' like.
Similar Threads
-
Simple Video Content Info (BD/DVD)
By DeathStalker77 in forum MediaReplies: 2Last Post: 30th Oct 2017, 13:28 -
Any software to check all the info of any video
By kenny1999 in forum Newbie / General discussionsReplies: 1Last Post: 12th Jan 2016, 16:12 -
DVD Info on my laptop conflicting info
By mk1059 in forum MediaReplies: 1Last Post: 22nd Feb 2015, 03:53 -
creating info from a video file
By JackieOne in forum Newbie / General discussionsReplies: 3Last Post: 7th Dec 2014, 09:30 -
invisible watermark(sort of) without re encoding? (headers, tag,...?)
By cdtsly in forum Video ConversionReplies: 0Last Post: 24th Nov 2013, 03:46