VideoHelp Forum
+ Reply to Thread
Results 1 to 14 of 14
Thread
  1. (I already asked this on SuperUser but didn't get any reply so far, so I have little hope of getting some useful insight over here, but let's try anyway...)


    Let's say there is a ZIP or RAR archive on a file sharing network, an old archive which has been out there for a long time, containing hundreds of small files (JPG pictures typically), and some parts are missing, like 20MB out of 500MB, there is no longer a single complete source and it's unlikely there will ever be, so anyone attempting to download it will get stuck with a large unusable file (well, the complete files inside can still be extracted, but most users either wait for the file to complete or delete it altogether after a while).

    But I may have all the individual files contained in those missing parts, found in other similar archives, or acquired from another source. The goal would be to sort of “revive” such a broken archive, in a case like this where only a small part is missing, so that it can be shared again.

    If an archive is stored without compression, such a process is tedious enough (I've done this a few times recently, painstakingly copying each file with a hexadecimal editor and reconstructing each individual file's header, then verifying that the hash code matched that of the original archive), but it gets really tricky if compression is involved, as it is not possible to simply copy and paste the contents of the missing files, they have to first be compressed with the exact same parameters as the incomplete archive, so that the actual binary content can match.

    For instance I have an incomplete ZIP file with a size of 372MB, missing 18MB. I identified a picture set contained within the missing part in another, larger archive: fortunately the timestamps seem to be exactly the same, but unfortunately the compression parameters aren't the same, the compressed sizes are different and the binary contents won't match. So I uncompressed that set, and attempted to re-compress it as ZIP using WinRAR 5.40, testing with all the available parameters, and checked if the output matched (each file should have the exact same compressed size and the same binary content when examined with the hex editor), but I couldn't get that result. So the incomplete archive was probably created with a different software and/or version, using a different compression algorithm.

    Now, is it possible, by examining the file's header, to know exactly what specific application was used to create it, and with which exact parameters ? Do the compression algorithms get updated with each new version of a particular program, or only with some major updates ? Are the ZIP algorithms in WinRAR different from those in WinZIP, or 7Zip, or other implementations ? Does the hardware have any bearing on the outcome of ZIP / RAR compression — for instance if using a mono-core or multi-core CPU, or a CPU featuring or not featuring a specific set of instructions, or the amount or RAM — or even the operating system environment ? (In which case it would be a nigh impossible task.)

    The header of the ZIP file mentioned above is as follows :
    Code:
    50 4B 03 04 14 00 02 00 08 00 B2 7A B3 2C 4C
    5D 98 15 F1 4F 01 00 65 50 01 00 1F 00 00 00
    I tried to search information about the ZIP format header structure, but so far came up with nothing conclusive with regards to what I'm looking for.

    The problem seems even more complicated with RAR files (I also have a few with “holes”), as they don't seem to have a complete index of their contents in the header or footer (like ZIP archives have), if I'm not mistaken, each file is referenced only by its own header, and without the complete list of missing files it's almost certainly a fool's errand. But I managed to find several versions of the rar.exe CLI compressor, with which I could quickly run tests in the hope of finding the right one, whereas WinZIP (which, I would guess, is most likely the software used to create a ZIP archive around 2003, based on the files' timestamps) apparently only works within the GUI, and installing a bunch of older versions just to run such tests would be totally unpractical and unreasonable for what is already a quite foolish endeavour in the first place !

    Thanks.
    Last edited by abolibibelot; 14th Nov 2019 at 15:55.
    Quote Quote  
  2. Member DB83's Avatar
    Join Date
    Jul 2007
    Location
    United Kingdom
    Search Comp PM
    Maybe this reply over-simplifies the 'solution'

    Have you attempted a 'repair' of the archive ? >> Run WinRAR >> Highlight file >> Choose Repair
    Quote Quote  
  3. Well, of course that won't work, since the goal is to reconstruct the original archive file, with the same hash code, so that people having had that file in their queue for months or years can finally complete it, and share it in turn (if it so happens that I do have all the missing files from the archive, then I don't really need to complete it for myself). Repairing is useful for extracting all files that are complete from an archive which is incomplete, for instance if file number 150 out of 500 is corrupted because of a large "hole" of missing data, usually the extraction with WinRAR stops right away, and hundreds of files aren't recovered, whereas the repair process creates a new archive with a modified structure which makes it possible to extract all valid files beyond the "hole", but it still won't make it possible to recover the files which have even a few bytes located in that "hole" (if the archive had recovery record data it's theoretically possible to actually repair mild corruption, but with several megabytes missing it's useless). What I'm looking for here is way more advanced, and perhaps even impossible... As I wrote, I managed to do it for archives created in "store only" mode, i.e. with no compression ; the problem with actually compressed archives is that the exact outcome of file compression varies a lot, apparently, and I'm trying to find out how to re-generate the exact same compression pattern as in those archives created 15+ years ago, without knowing how they were created in the first place.
    Last edited by abolibibelot; 14th Nov 2019 at 19:19.
    Quote Quote  
  4. not possible, just compress the missing files in a new archiveand move on..

    compression dictionaries differ acc to total files input.. compressor version etc..
    so you will never reconstruct the original with your current setup
    Last edited by teodz1984; 14th Nov 2019 at 19:38.
    Quote Quote  
  5. compression dictionaries differ acc to total files input.. compressor version etc..
    so you will never reconstruct the original with your current setup
    Could you elaborate a bit ? You mean that, for instance, even with the same compressor version on the same setup, with a per-file compression scheme (as opposed to the so-called "solid" mode), compressing 100 files and then compressing 99 out of those 100 will produce a different result ? I did a quick test : compressed 8 small files in ZIP "good" (with WinRAR 5.40 again), then compressed 7 out of those 8, the order in which the files are positioned within each archive is different, but the binary contents for each file present in both archives are exactly identical, so it would seem like each file is processed independently in a reproducible way regardless of the total number of files.
    Is there a place where I could find a thorough list of the factors which affect the outcome of file compression ? Do you know if hardware is such a factor ?

    and move on..
    Yeah, that's somethin' I have trouble with generally speakin'... é_č
    Quote Quote  
  6. https://en.wikipedia.org/wiki/Zip_(file_format)

    compare same files compressed with different versions of winzip, winrar etc..
    each produces a different result that is not binary .
    Quote Quote  
  7. as you can see in the documentation, algorithms have have changed over the years, so only way to recreate is have thw orinal software do the recompression
    Quote Quote  
  8. Dinosaur Supervisor KarMa's Avatar
    Join Date
    Jul 2015
    Location
    US
    Search Comp PM
    7zip should allow you to extract incomplete archives, obviously it will just stop once the missing block is reached.
    Quote Quote  
  9. I'll try to revive this thread by asking a related question, probably easier to answer :
    What were the most likely software tools used to create a ZIP archive for each time period ? For instance : 2002-2003, 2005-2006, 2008-2010, 2016-2017...
    For a long time, if I'm not mistaken, WinZIP was the default and most common program used for that purpose. Problem is, it seems to require a full blown install to work properly, I couldn't find any standalone CLI version in my old WinZIP installs on Windows XP (as opposed to WinRAR which includes a standalone rar.exe), so it would be very complicated to test several versions in a row.
    Quote Quote  
  10. Well, at least I got a reply... To make it remotely relevant to that specific issue : what year was it when you were 14 ?

    (to anyone reading this)
    Generally speaking, approximately when did WinZIP stop being the default / most common software to create ZIP archives on Windows ?
    Quote Quote  
  11. Member Cornucopia's Avatar
    Join Date
    Oct 2001
    Location
    Deep in the Heart of Texas
    Search PM
    ?? 2010 - 2013 ?? I've been using 7zip at least since then, as have my colleagues.
    But WinZip always had competition with WinRAR, PKzip itself, etc. It may have been the leader, but not overwhelmingly.

    Scott
    Quote Quote  
  12. Alright, thanks, so that makes it nearly impossible to guess which ZIP compressor might have been used by “some regular dude uploading stuff on teh Internetz” at a given point in time. Since apparently each implementation has its own combination of default settings and whatnot, since there is apparently no way to determine which specific utility was used based on the header, and since I'm not even sure if and how the hardware affects compression, the odds of achieving the intended result are very low indeed. (I once managed to re-create a multi-part compressed RAR archive strictly identical to the one I had downloaded, with the version of WinRAR I had installed, but that was quite lucky, and with the RAR format there are less “moving parts” since only WinRAR can create them as far as I know — although the interesting part is that the hardware didn't seem to affect the outcome, unless the creator of said archive happened to have the same CPU.)
    Quote Quote  
  13. Member
    Join Date
    Mar 2024
    Location
    France
    Search Comp PM
    I know this is an old thread, but in emulation domain that question has long been solved.
    https://github.com/tikki/trrntzip
    https://romvault.com/trrntzip/
    https://github.com/tikki/t7z
    https://github.com/BubblesInTheTub/torrent7z
    Those applies to both zip and 7z format, sadly no rar (yet?).

    EDIT : there's also https://reproducible-builds.org/ that is more and more popular, which should include archives and other data, so that a whole software with its data are reproducible (and binary identical) upon regeneration.
    Last edited by sebbu; 6th Mar 2024 at 14:18.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!