Post
Topic
Board Development & Technical Discussion
Re: python script compare lines in 2 text files and output matches
by
odolvlobo
on 24/09/2019, 16:57:54 UTC
Code:
...
for firstline in firstfile:
  if firstline in secondfile:
    print >>f1, (firstline)
For a small number of lines, that might be ok. But for a large number of lines, it would be faster to sort the files first, and then compare. It's O(n log n) vs, O(n2).

Like this:

Does it have to be python? Bash command comm does exactly what you need:
Code:
comm -12 <(sort file1) <(sort file2)

I don't know how fast a Python loop would be, but the above code takes about 0.05 seconds for 2 files with 50,000 lines each.

Comparison psuedo code looks like this:
Code:
e1 = file1.begin()
e2 = file2.begin()
while e1 ≠ file1.end() and e2 ≠ file2.end()
    if *e1 < *e2
        ++e1
    else if *e1 > *e2
        ++e2
    else
        print *e1
        ++e1
        ++e2

Computer science FTW.