This is a bit of a ways from video per se, but I have been using ProjectX to extract subtitle text data from a DVB-T stream captured with EyeTV, and my plan is to add it as metadata to an mp4 transcode of the DVB MPEG2 video data to generate searchable video.

But the subtitle text data typically looks like this:

in_00:16:33.824|out_00:16:33.884 United States dominance of
in_00:16:33.884|out_00:16:34.144 United States dominance of
in_00:16:34.144|out_00:16:34.204 United States dominance of the
in_00:16:34.204|out_00:16:34.444 United States dominance of the
in_00:16:34.444|out_00:16:34.504 United States dominance of the
in_00:16:34.504|out_00:16:34.804 United States dominance of the
in_00:16:34.804|out_00:16:34.884 United States dominance of the world financial
in_00:16:34.884|out_00:16:35.664 United States dominance of the world financial
in_00:16:35.664|out_00:16:35.724 world financial system
in_00:16:35.724|out_00:16:36.544 world financial system
in_00:16:36.544|out_00:16:36.604 world financial system that
in_00:16:36.604|out_00:16:37.164 world financial system that
in_00:16:37.164|out_00:16:37.224 world financial system that
in_00:16:37.224|out_00:16:37.424 world financial system that
in_00:16:37.424|out_00:16:37.504 world financial system that inevitably
in_00:16:37.504|out_00:16:37.804 world financial system that inevitably
in_00:16:37.804|out_00:16:37.864 inevitably the tide
in_00:16:37.864|out_00:16:38.204 inevitably the tide
in_00:16:38.204|out_00:16:38.264 inevitably the tide is
in_00:16:38.264|out_00:16:38.424 inevitably the tide is
in_00:16:38.424|out_00:16:38.484 inevitably the tide is moving
in_00:16:38.484|out_00:16:38.684 inevitably the tide is moving

Basically I would like to be able to remove duplicate lines based on the first 11 characters of each line, so it keeps timings and text but eliminates most of the repetition of data. I have been trying to use TextWrangler and its "grep" patttern functions to do this, but as a non-programmer I am very bamboozled. Would anyone be kind enough to take me through whether this is possible and how it could be done?