VideoHelp Forum
+ Reply to Thread
Results 1 to 20 of 20
Thread
  1. I am recovering videos from a broken drive and I literally have nothing but raw data to work with. I can already recognize most containers by the first few bytes. AVI starts with RIFF, MP4 starts with ftypmp* and so on. What I need is a way to tell when a file ENDS. WMV does the work for me because the filesize is in the header and so does AVI but it doesn't matter because the obvious patterns at the end of an AVI gives it away.

    Mostly I need help identifying the correct length of MP4s.
    Quote Quote  
  2. Member
    Join Date
    Mar 2008
    Location
    United States
    Search Comp PM
    For mp4, the mdat + moov atoms is almost the complete file. Look at some good files in a hex editor and you'll see.
    I'm not aware of a place where the absolute number is stored. It might not be. Perhaps someone else will chime in
    Quote Quote  
  3. Member
    Join Date
    Aug 2013
    Location
    Central Germany
    Search PM
    To know how an MP4 container can be structured (similar to MOV and 3GPP), will require to study the "ISO Media" container specifications. Several chunks are mandatory, others are optional, but they all will have a hierarchy, they cannot be placed in any arbitrary order. Chunks usually have a size field, so you can skip forward this number of bytes to find the next chunk (maybe with some exception of "list" style chunks), until you reach the end of the file.
    Quote Quote  
  4. Originally Posted by davexnet View Post
    For mp4, the mdat + moov atoms is almost the complete file. Look at some good files in a hex editor and you'll see.
    Atoms? What are you talking about? "Almost the complete file" doesn't cut it either. I need to know exactly when it ends. It's obvious with AVI but not with MP4 and others.

    Ligh.de, many of these videos are hours long so I will not have time to inspect every individual chunk of every video to say the very least. I need a way to quickly tell where videos begin and end and so far it's only possible with AVI and WMV. Even this is taking forever where I can only recover a few in one hour.
    Quote Quote  
  5. Member
    Join Date
    Aug 2013
    Location
    Central Germany
    Search PM
    Try to understand the purpose of the following tool:

    http://atomicparsley.sourceforge.net/

    Then we talk again.
    Quote Quote  
  6. Originally Posted by Aludin View Post
    Originally Posted by davexnet View Post
    For mp4, the mdat + moov atoms is almost the complete file. Look at some good files in a hex editor and you'll see.
    Atoms? What are you talking about?
    The size of a file is in the directory entry for the file. Otherwise, for the MP4 container, you have to parse the atom tree to get the size. There is no overall file size entry in the MP4 headers that I know of. Each atom starts with a 4 byte size element (big endian) followed by the four character atom name and then data for that atom.
    Last edited by jagabo; 8th Sep 2018 at 12:01.
    Quote Quote  
  7. Member DB83's Avatar
    Join Date
    Jul 2007
    Location
    United Kingdom
    Search Comp PM
    Aludin,

    Have you not tried the standard file recovery tools such as Getdataback or ZAR (Zero Assumption Recovery) ? I have had success with both of these.
    Quote Quote  
  8. Ah, I get it now. MP4 is more 'raw' than others so info like size has to be inferred by putting the chunks together. The word atom confused me. Basically, I can overestimate the data I'm selecting with the hex editor and this program will have no problem knowing when it ends. This is doable. Thank you.
    At first I thought I would have to manually add up the data of every single chunk. Derp!

    jagabo, the MFT is lost, that's the problem. Directory indexes are still scattered on the drive but without knowing where on the disk each entry points to, I have no way to know which file is which.

    DB83, I have tried recovery tools though not the ones you mentioned. They are incredibly unreliable so I decided I had no choice but to do it manually. This drive is largely unfragmented yet most of what these programs autorecovered was pieces. Do you know of one which is more interactive? I'd like to see exactly the data it has selected and approve each one.
    Quote Quote  
  9. Member DB83's Avatar
    Join Date
    Jul 2007
    Location
    United Kingdom
    Search Comp PM
    ZAR will preview what it is going to recover. Since it scans the entire disk or partition (selectable) this process will take some time - many hours for a large partition.

    I believe the free version only scans and previews. You would have to purchase the program to do a full recovery or the files you select. But as I said I recovered and entire partition with the full program so more than happy with it.
    Quote Quote  
  10. Guys, I've figured out what the headers of most formats look like but I need some assistance with the TS format. There doesn't seem to be a fixed pattern. MPEG files are also weird. I noticed most of them start with 00 00 01 Bx but they have this header multiple times in the same video.

    Also, is there a way to know the size of an MKV file?
    Quote Quote  
  11. Transport streams are designed to be picked up midstream (for example, when you turn on your TV you don't have to start watching a show from the beginning) so headers are repeated every half second or so. Transport streams can also change resolution (for example ads are often at a different resolution than the show).
    Quote Quote  
  12. Originally Posted by Aludin View Post
    Ah, I get it now. MP4 is more 'raw' than others so info like size has to be inferred by putting the chunks together. The word atom confused me. Basically, I can overestimate the data I'm selecting with the hex editor and this program will have no problem knowing when it ends. This is doable. Thank you.
    At first I thought I would have to manually add up the data of every single chunk. Derp!

    jagabo, the MFT is lost, that's the problem. Directory indexes are still scattered on the drive but without knowing where on the disk each entry points to, I have no way to know which file is which.

    DB83, I have tried recovery tools though not the ones you mentioned. ShowBox Mobdro TutuApp They are incredibly unreliable so I decided I had no choice but to do it manually. This drive is largely unfragmented yet most of what these programs autorecovered was pieces. Do you know of one which is more interactive? I'd like to see exactly the data it has selected and approve each one.
    I trust the free form just sweeps and sees. You would need to buy the program to complete a full recuperation or the records you select. Be that as it may, as I said I recouped and whole segment with the full program so more than content with it.
    Quote Quote  
  13. Have you not tried the standard file recovery tools such as Getdataback or ZAR (Zero Assumption Recovery) ? I have had success with both of these.
    To add to the above suggestions, R-Studio and Photorec (part of TestDisk) would be worth trying. Photorec (freeware) has the drawback of requiring to extract “blindly” the whole bunch of identified files from the entire volume / partition, which can be a lot of data, and not practical if you only need a few specific files, but it can be more efficient at identifying some file types. You can choose which file types you want it to extract / carve, by setting the “file options” before proceeding. With R-Studio you can preview files before extracting them (at least with the hexadecimal analyzer, and for most video formats you can actually read the file from within the recovery tree) and choose which ones you want / don't want in each category, within its well-organized file types hierarchy. R-Studio is constantly improving with regards to raw file detection, for instance it used to be inferior to Photorec at identifying MKV files, now it's just as efficient. Both softwares allow to add custom file signatures if a user is looking for a rare kind of file, but for your purposes that shouldn't be necessary as the types you mention are very common and well identified.
    As for detecting the end of a file, the usual behaviour for such recovery softwares, for file types which do not have a field relative to the expected complete size in their header, is to attribute all subsequent sectors to a given file until the header of the next file is found. But the main problem of raw file recovery is fragmentation : on a drive with regular activity, where lots of files have been written and deleted then written again, most files (especially large video files) will be fragmented, i.e. the end of a file is not located in the continuity of the begining, there can be a gap, small or very large, the end can even be located before the begining, there can be 2-3 chunks or hundreds, it's totally unpredictable, and there is no regular software that I'm aware of which can efficiently reconstruct fragmented files. Photorec purports to do just that, but in my experience it very rarely gets even one fragmented file complete. Fragmentation can be tremendous in the case of files downloaded simultaneously, even when there's plenty of available space : I've had a situation once where I had recovered almost the whole contents of a failing 3TB HDD, except 6 video files, but several of those were downloaded at the same time and thus massively fragmented, with as much as 8000-12000 little chunks each, I made a mistake and had to use a very special trick to rebuild them (in a nutshell : I knew thanks to a read scan with HD Sentinel and a few other tricks that those particular files contained bad sectors so I put them in a separate folder while I was recovering everyting else, hoping to not damage the drive further by insisting repeatedly on damaged areas, and it turned out to be a wise move indeed, because when I did try to get those files the state of the drive deteriorated quickly ; then, as I was limited in storage space for the recovery, I chose to image only the area where those files were located, with a safety margin before and after, plus the first 10GB, as I thought that it would contain all the metadata needed to extract those files, but that proved to be unwise, as the MFT itself was fragmented and had extents near the middle and near the end of the drive, it was stupid because I already knew of a method – with ddrescue and a complementary tool called ddru_ntfsbitmap – to specifically recover the very important MFT before attempting to recover the data area, but you're bound to do stupid things when your own data is in jeopardy if you don't take some time to think it through and devise a thorough plan of action before attempting anything... ; so I managed to recover most of those files' sectors, but I couldn't extract the actual files, with R-Studio or WinHex, as I didn't have the complete MFT – as I found out, those tools can't rely solely on the volume snapshots / scan information backups that they generated to keep track of files' allocation informations, they need to interact with the actual MFT – and by that point the drive was too damaged to recover the missing chunks ; but luckily, I had copied the complete list of sectors allocated to each of those files, obtained with three different tools – HD Sentinel, Recuva, nfi.exe – and with that I managed to create ddrescue scripts to rebuild the actual files by copying the relevant sectors from the image in an automated way... otherwise it would have been waaay too time-consuming to do that manually !).

    I am recovering videos from a broken drive and I literally have nothing but raw data to work with.
    That means that all the file system data is completely gone ? How did you get that “raw data”, perhaps by imaging or cloning the defective drive with something like ddrescue ? Are there lots of bad sectors at the begining ? (Usually the MFT, Master File Table, which contains all the metadata and file allocation information, is located within the first few gigabytes.)
    Both R-Studio and Photorec can open a volume image as well as a physical drive (with Photorec either launch it via the command line and specify the path to the image, or – on Windows – right-click on the image, select “open with...”, then locate photorec.exe).

    Also, is there a way to know the size of an MKV file?
    There's a size field in the header. Its exact location varies slightly, as there are several types of MKV signatures. For instance, if I open a 1145529656 bytes MKV file in WinHex, I can see on the fourth line : 44 47 65 04, which is 1145529604 in decimal (which must be the size of the file minus the header). Both Photorec and R-Studio seem to rely on this information when detecting MKV files, and for instance, if Photorec extracts a MKV file which has a size inferior to the size found in the header, it considers the file as “broken” (the file is extracted with a name like “b123456.mkv” instead of “f123456.mkv”, where “123456” is the number of the first allocated sector).
    Last edited by abolibibelot; 28th Oct 2018 at 14:59.
    Quote Quote  
  14. Wow, had no idea of your reply abolibibelot. I must have email notifications turned off. I'll read up on it all tomorrow. For the future, separate your paragraphs so it's easier to read.kthx

    But just so everyone knows, I've already recovered everything by hand so this case is pretty much closed. It took forever and I never wanna do anything like this again. My left hand is ******* burning. I learned a lot in the process and can easily see how this can be automated but in the end I didn't wanna pay for that program that DB83 recommended so I did it manually. The beginnings and ends of all these files (except MKV and WMV) are so obvious.

    For the record, TS files always begin with bytes 47 40 and the header is followed by a bunch of repeating bytes so it's hard to miss. The only problem is that the header repeats in the same video much like MPEG so it's hard to tell if it's one video or multiples.
    Quote Quote  
  15. Member DB83's Avatar
    Join Date
    Jul 2007
    Location
    United Kingdom
    Search Comp PM
    ^^ You only get a notification for the first post after your latest reply so you should get a notification about this reply. That is until you visit the forum again.

    As for doing file recovery manually I take my hat (if I wore one) off to you. But as you said the drive is not fragmented. To attempt this on a fragmented one would, in my estimation, be an even more thankless task.

    And sometimes to pay even a small sum is worthwhile. Unless you have tons of time on your hands
    Quote Quote  
  16. Hell of a story, abolibibelot. Backing up the MFT is essential indeed, if you can't back up the whole drive that is. After this incident, I have backed up the MFTs of all my important drives.
    You say photorec and all these other recovery tools only look for headers of a file and not endings, but they should. Look at the ending of any AVI file and observe the pattern. It's 01wb or whatever and then 4 bytes of steadily increasing values and then 00dc or another 01wb. MP4 has a similar pattern. WMV does too but it's not obvious because it's only like 1000 bytes, not 10% of the file. MKV sometimes has some readable text of metadata at the end but it's not consistent.
    The developers of these recovery tools need to collaborate with the designers of these media containers.

    That means that all the file system data is completely gone ? How did you get that “raw data”, perhaps by imaging or cloning the defective drive with something like ddrescue ?
    From what I see, yes, completely gone. I opened the drive in WinHex.

    Are there lots of bad sectors at the begining ? (Usually the MFT, Master File Table, which contains all the metadata and file allocation information, is located within the first few gigabytes.)
    There was a bad sector right at the beginning but it did not corrupt the MFT. That occurred when I ran checkdisk. Never happened before but chkdsk completely overwrote the MFT with an empty one.

    It's good to know MKV does store size info but the address should always be fixed like WMV is.

    DB83, the drive actually is fragmented but not much. About 10% of the files were, 1/4 of them horribly.
    I see now that this process can be automated to a large degree but not as accurate as human intervention. I only have to notice patterns in the bytes to see that it's obviously the beginning or end of a video format but this isn't as easy to teach a program. I mean, searching for hex "4740" on the whole drive returned millions of results. If it's really a .TS, it would start with 4740 and then have a bunch of patterns but they aren't fixed enough to relegate to a simple keyword that a program can look for.
    I don't regret doing it manually. No program would've come close to this level of precision. Recovering fragmented files is a mindfück however... I have a few ideas on how this can be automated but it would still require frequent human intervention and in the end I doubt it's worth the effort since preventative measures like backups are literally a matter of clicking a button and waiting 60 minutes to avoid this bullshit altogether.
    Quote Quote  
  17. Member DB83's Avatar
    Join Date
    Jul 2007
    Location
    United Kingdom
    Search Comp PM
    I would disagree.

    Like I said, the programs I used, ZAR in particular, fully recovered a non-accessable drive/partition. Of course there must be info such as the TOC for these to work but they do not read the drive sector by sector. They read the most recent sector and then read the next sector of the actual file. I hardly think that DOS's have changed so much that the next active sector is not referenced at the end of the recent one. To argue that they need educated is unfair in the extreme. Some might but not all.

    I have no connection with ZAR. I recc it because I used it. Remember what ZAR stands for ? Zero Assumption Recovery. So it hardly needs educating.

    You might well have the advantage over many having some advanced knowledge how disk systems work. But maybe you should put that knowledge to work and devise your own software. Only then might you appreciate just how difficult it is to create a good program for this.
    Quote Quote  
  18. Backing up the MFT is essential indeed, if you can't back up the whole drive that is.
    But in this particular case, it was still wise to copy the files directly from within Windows (using Robocopy or SynchronizeIt – among the rare Windows tools which preserve all timestamps of copied files/folders, including the folders' timestamps), by order of importance, considering that only those large video files contained bad sectors (to be precise one more unimportant file turned out to be unreadable, I could have identified it using Defraggler – it's primarily a defragmentation tool but it's also the only tool I know which can display a list of files contained within a particular area of a storage volume), as opposed to doing a full clone with a specialized tool like ddrescue or the newer HDDSuperClone, which is the usually recommanded way to proceed with a failing HDD. If I had gone the clone way, since the bad areas started near the 2TB mark, I may have ended up with a partial MFT (since some chunks of it were at the very end of the drive) plus about 1TB worth of data totally unreadable once the drive became really unstable (each time the drive attempted to access a bad area, the “pending sector count” increased and the “health” status in Hard Disk Sentinel got worse ; I had to move those 6 files from the command line, because when selecting them from Windows 7 Explorer the files would be parsed to display a preview, which was enough to add some more bad sectors...). All in all I consider myself lucky, considering that I had used that drive for months with no backup, especially now that I know how dreadful the reputation of Seagate “DM” drives is among data recovery professionals, as can be read over and over on forum.hddguru.com (the drive in question was a ST3000DM001, which is apparently the worst offender).

    Hum, that's gonna be quite a mouthful to read again, sorry...

    After this incident, I have backed up the MFTs of all my important drives.
    It's only useful if the contents are static... otherwise you should make regular backups. But then you might as well regularly backup the files themselves ! But still it's not a bad idea, as a complementary measure, the size of the MFT is relatively small and it can save from major screw-ups (accidental formatting, CHKDSK scan gone awry...).

    There was a bad sector right at the beginning but it did not corrupt the MFT. That occurred when I ran checkdisk. Never happened before but chkdsk completely overwrote the MFT with an empty one.
    CHKDSK can efficiently fix small filesystem inconsistencies, but it can indeed seriously screw up a volume with more severe logical damage, even more so in a case of hardware damage. In a case like this, the right course of action would have been to first check SMART informations (with HDTune, CrystalDiskInfo, or better but not free HD Sentinel), then if bad sectors were reported, proceed to do a full clone (ideally two if no prior backup), then try to access the clone, then if no access or partial access extract the data with a good recovery software (Recuva is excellent for a freeware, R-Studio much more advanced but not free), then run CHKDSK on the clone to try an in-place filesystem repair (knowing that it might fail).

    I had once a 3TB HDD full of TV recordings being identified as 746GB when plugged to a portable computer running on Windows Vista through a USB enclosure, with a warning to check it for errors, which actually runs CHKDSK under the hood : without thinking twice, I accepted... it did find many errors and attempted to “fix” them... next thing I know, hundreds of files ended up corrupted, with a chunk of file A being now identified as a part of file B, a WMV file now appearing as a completely unrelated MP4 file, interrupted in the middle by a TXT file, and so on... Luckily I had a complete backup (missing only a few unimportant or temporary files), but I still had a hard time trying to figure out what happened. In this case there was no physical failure, no bad sector, yet CHKDSK made a huge mess with no “undo” option. Crazy that they didn't think it through when they designed that tool, and that it is still operating with no safety net on recent Windows systems, with more than 20 years of development and user feedback.

    You say photorec and all these other recovery tools only look for headers of a file and not endings, but they should. Look at the ending of any AVI file and observe the pattern. It's 01wb or whatever and then 4 bytes of steadily increasing values and then 00dc or another 01wb. MP4 has a similar pattern. WMV does too but it's not obvious because it's only like 1000 bytes, not 10% of the file. MKV sometimes has some readable text of metadata at the end but it's not consistent.
    The developers of these recovery tools need to collaborate with the designers of these media containers.
    Each of these softwares uses its own method, and different methods for different file types, I don't know exactly how they operate, but the clever ones must take into account everything that is documented about a particular file format. JPG pictures for instance have a constant signature in the footer (a valid JPG file ends with FF D9) so they're easy to carve accurately, but other file types don't have a particular footer pattern, in this case – or in the case of a fragmented file – the only possibility is to keep extracting the sectors until another header is found. There are more powerful dedicated file carving softwares, which attempt to analyze the inner structure of each file in a much more sophisticated way, in order to assemble fragments when possible, like this one, I haven't tried it so I couldn't say how efficient it is but the author(s) definitely seem(s) to know their stuff.

    Regarding MKV files, I signaled on the dedicated R-Studio forum how that otherwise excellent software used to have poor results for raw MKV detection, which may have contributed to its vast improvement on that particular aspect in later versions. From what I could see, there are at least three types of MKV headers, depending on where the “matroska” string is located, which is indeed surprising, as usually a file header should have a constant and instantly recognizable pattern.

    @DB83
    Like I said, the programs I used, ZAR in particular, fully recovered a non-accessable drive/partition. Of course there must be info such as the TOC for these to work but they do not read the drive sector by sector. They read the most recent sector and then read the next sector of the actual file. I hardly think that DOS's have changed so much that the next active sector is not referenced at the end of the recent one. To argue that they need educated is unfair in the extreme. Some might but not all.
    For most common file format there is no information within the file itself which indicates which sector should come after each, if that's what you mean. All that information is located in the MFT, and if the MFT gets wiped indeed, raw file carving becomes the only option, and it can't be perfect except in very simple situations.

    You might well have the advantage over many having some advanced knowledge how disk systems work.
    It seems to me that he acquired that knowledge the hard way, while investigating this particular issue... and created this very thread a few months ago for that purpose ! (And actually “advanced” knowledge would mean understanding in depth the role and significance of each field in the header and other structures of a file, for at least a good amount of the most common file types...)

    But maybe you should put that knowledge to work and devise your own software. Only then might you appreciate just how difficult it is to create a good program for this.
    It seems to me that he said just that... that it must be very tricky, considering how complicated a single file type can be, and the intricacies of each file system, and so on...
    Quote Quote  
  19. @abo, you have good taste in software tools. I like to preserve timestamps as well. I fücking idolize my childhood and keep it well preserved. Who needs a goddamn diary when you have a trillion zeroes and ones?

    I used the same seagate except it was a 2TB and it failed me almost exactly 5 years after I bought it. At the very first sign of trouble (multiple sectors being reallocated all of a sudden and rapidly rising) I unplugged the fücker and cloned it to the new 2016 Seagate drive of the same capacity with Winhex. Trust me, Winhex is better than any tool specifically made for cloning disks. I have no idea why and neither did Winhex's own authors until recently where they finally made a product exclusively for that purpose.

    It's only useful if the contents are static...
    Indeed and they are. My system drive is an SSD which I make regular backups of. The larger data on the HDD is stuff I would miss less if lost but as you can see I still went to extreme measures to recover my stuff when it was lost and half of it I haven't had a chance to even go thru yet.

    CHKDSK has never failed me for the past 20 years until now so I didn't know any better. It said there was only an error in a few files so my impulse was to let it fix it rather than let this tiny error take up half my day of backing up 250 GB worth of stuff. I know better now. I've come out of so many worse crashes that I didn't think this one could top it.
    One time I lost my partition tables for my HDD (which contained backups including for my damaged system drive) and had to recover 8 lost partitions, took me 16 hours to recover the first 7 and the last one was the worst because it was exFAT which no recovery tools recognized. It only had one file in it (a volume file) which was like 200GB and it was fragmented into 4000 pieces. NEVER again did I use exFAT after that.

    I'm completely stumped as to how CHKDSK could fück up so bad but I learned my lesson and will never use that again either, sure as hell not without backing up at LEAST the MFT if not the entire damn volume.
    I was gonna say I'll only exclude it in the instance of hardware problems but after you told me it screwed you over too when your drive was perfectly healthy, it shows me the obvious. CHKDSK is an obsolete slice of shit that doesn't belong in this millennium.

    It seems to me that he said just that... that it must be very tricky, considering how complicated a single file type can be, and the intricacies of each file system, and so on...
    ...and this is a USB flash drive on top of everything, wear-leveling and all. Yeah, complicated.
    Quote Quote  
  20. Originally Posted by DB83 View Post
    I would disagree.

    Like I said, the programs I used, ZAR in particular, fully recovered a non-accessable drive/partition. Of course there must be info such as the TOC for these to work but they do not read the drive sector by sector. They read the most recent sector and then read the next sector of the actual file. I hardly think that DOS's have changed so much that the next active sector is not referenced at the end of the recent one. To argue that they need educated is unfair in the extreme. Some might but not all.

    I have no connection with ZAR. I recc it because I used it. Remember what ZAR stands for ? Zero Assumption Recovery. So it hardly needs educating.

    You might well have the advantage over many having some advanced knowledge how disk systems work. But maybe you should put that knowledge to work and devise your own software. Only then might you appreciate just how difficult it is to create a good program for this.
    I wrote a reply but lost it. Basically, what I wrote was that I wasn't slamming anyone/thing, just saying it how it is and that the actual developers of the AVI, MP4, WMV and MKV formats should be consulting the authors of data recovery software, not an unqualified enthusiast like myself who doesn't know anything beyond what the headers of those formats 'look' like.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!