...
for firstline in firstfile:
if firstline in secondfile:
print >>f1, (firstline)
For a small number of lines, that might be ok. But for a large number of lines, it would be faster to sort the files first, and then compare. It's O(n log n) vs, O(n
2).
Like this:
Does it have to be python? Bash command
comm does exactly what you need:
comm -12 <(sort file1) <(sort file2)
I don't know how fast a Python loop would be, but the above code takes about 0.05 seconds for 2 files with 50,000 lines each.
Comparison psuedo code looks like this:
e1 = file1.begin()
e2 = file2.begin()
while e1 ≠ file1.end() and e2 ≠ file2.end()
if *e1 < *e2
++e1
else if *e1 > *e2
++e2
else
print *e1
++e1
++e2
Computer science FTW.