VideoHelp Forum




+ Reply to Thread
Results 1 to 2 of 2
  1. Member Chopper Face's Avatar
    Join Date
    Jun 2001
    Location
    Ottawa, Ontario, Canada
    Search Comp PM
    Here's where I'm at. I have not been able to find any sofware for this.

    I have a DVD which is in Japanese with no subtitles.

    I have a subbed avi that is ripped from that DVD.

    I want to get those subtitles in some sort of timed text file so that I can rip the DVD and add the subtitles.

    Is there software out there that can just look through the .avi and create a timed text file?

    I know subrip will only reckognise exact pixel for pixel matches for characters which I'm not going to get here. The image is compressed because of being converted to div-x so frame by frame the same sub won't look identical enough for it to find a match.

    I'm not completely adverse to rewritting the subs in if I can get it timed. It's just I'd like to save myself the trouble of manually retiming everything as it's very time consuming and I hope to do this with many episodes.

    I have been able to do a little work to it with virtual dub filters which might help.

    The original sub looks like this:

    I've cropped it and put some white bars because there's a lot of unsubbed space on the screen and there was black leaking from the edges.

    If I use a threshold filter set at one then only the deep blacks of the subs get through but you can't really see what is there:


    If I set the threshold a little higher. At about 10 or so I can actually read the subs (not sure if software could) but there's more off black colours randomly throughout the episode that come through.


    Here are some ideas which I've had that might work but would take some effort so I'm putting them off to see if I can come up with an easier solution. These haven't been tested so I'm not sure how successfull they might be.

    1) Convert to some format that subrip will understand and make it look for something like a 1x1 pixel which would be something found when there is a sub on the screen. Since the picture, even with the same sub, changes slightly this would not give me the same value for every frame. Each frame would probably generate a new line for the sub. I'd have to write something to go through the file and check to see if the number of little black pixels in each frame was close to the previous one. If it was close then figure the sub hasn't changed and merge the lines with the start time of the first and end time of the second. Drop off any lines with less than a certain number of pixels because that could just be a few dots getting through. Once this is done I can just use software to jump from line to line and write the subs in the image to match that line. Subtitle workshop could do this.

    2) Rip all frames to .bmp format. Check each file to see how much black there is in it. This can be done by converting to 2 colour bmp and just checking the bits. Store all these values in a file indexed by the time range of each frame. Go through like in my other example to try to guess when the sub changes or is not present and narrow down the time ranges. Then go through and add the subs manually.

    3) Rip all frames to .bmp format. Use some kind of ocr software to automatically check each file and see if it finds any text in it. Put all that in some big timed text file and then go through the file merging lines with the same text together with the range spanning from the first to last.

    There are a few instances where there is a good amount of black that leaks through into the image that would probably interfere with all of these. They're rare cg scenes that I could probably find and cut out manually.

    It would also help me if there were a filter I could set up that would detect just the pinkish colour of the subs.

    Any ideas anyone has are appreciated.
    Quote Quote  
  2. Member Chopper Face's Avatar
    Join Date
    Jun 2001
    Location
    Ottawa, Ontario, Canada
    Search Comp PM
    I finally tried out option 2. It's pretty much working. It's not perfect since sometimes if a black object walks behind the subs it makes it look like it's changing when I'm not, in which case I just merge 2 lines. Other times two subs follow each other directly without a pause and if they have a similar amount of text in each lines they aren't split because the program can't tell it's changed. In that case I split it. For the most part it's pretty handy and making my work a lot easier than retiming from nothing.

    Here's the code if anyone cares:
    Code:
    #include <stdio.h>
    
    int main(void)
    {
      int   i=0, j, k=0, count, diff, last=0, start=0;
      char  startmin[3], startsec[3], startmil[3], endmin[3], endsec[3], endmil[3], filename[25];
      FILE  *fp;
      float starttime, endtime;
    
      printf("ScriptType: v4.00\n");
      printf("Collisions: Normal\n");
      printf("PlayResY: 864\n");
      printf("PlayDepth: 0\n");
      printf("Timer: 100.0000\n");
      printf("\n");
      printf("[V4 Styles]\n");
      printf("Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, TertiaryColour, BackColour, Bold, Italic, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, AlphaLevel, Encoding\n");
      printf("Style: Default,Arial,48,65535,65535,65535,-2147483640,-1,0,1,3,0,2,30,30,30,0,0\n");
      printf("\n");
      printf("[Events]\n");
      printf("Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text\n");
      while (k == 0) {
        sprintf(filename, "E:\\movies\\dvd\\pics\\%i.bmp", i);
        if(( fp = fopen (filename, "rb") ) != (FILE *)0) {
          for (j=0; j<53; j++)
            fgetc (fp);
          count=0;
          while (! feof (fp) )
            if (fgetc (fp) != 255)
              count++;
          fclose(fp);
          diff = count - last;
          if (diff < 0)
            diff = diff*(-1);
          if (diff > 0.20*(count) & (diff > 100) ) {
            if (last < 100)
              start = i;
            else {
              starttime = (1/29.97)*start;
              endtime = (1/29.97)*i;
              if ((int)starttime/60 < 10)            sprintf(startmin, "0%i", (int)starttime/60);
              else                                   sprintf(startmin, "%i",  (int)starttime/60);
              if (((int)starttime) % 60 < 10)        sprintf(startsec, "0%i", ((int)starttime) % 60);
              else                                   sprintf(startsec, "%i",  ((int)starttime) % 60);
              if (((int)(starttime*100)) % 100 < 10) sprintf(startmil, "0%i", ((int)(starttime*100)) % 100);
              else                                   sprintf(startmil, "%i",  ((int)(starttime*100)) % 100);
              if ((int)endtime/60 < 10)              sprintf(endmin, "0%i",   (int)endtime/60);
              else                                   sprintf(endmin, "%i",    (int)endtime/60);
              if (((int)endtime) % 60 < 10)          sprintf(endsec, "0%i",   ((int)endtime) % 60);
              else                                   sprintf(endsec, "%i",    ((int)endtime) % 60);
              if (((int)(endtime*100)) % 100 < 10)   sprintf(endmil, "0%i",   ((int)(endtime*100)) % 100);
              else                                   sprintf(endmil, "%i",    ((int)(endtime*100)) % 100);
              printf("Dialogue: Marked=0,0:%s:%s.%s,0:%s:%s.%s,*Default,,0000,0000,0000,,%i to %i\\n%f to %f\n", startmin, startsec, startmil, endmin, endsec, endmil, start, i, starttime, endtime);
              start = i;
            }
          }
          last = count;
          i++;
        } else
          k = 1;
      }
      exit(0);
    }
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!