Thursday, July 24, 2008

Python string concatenation performance

A small test revealed that "".join(listOfStrings) is not faster than plain +=. The .join() is slower. Using .append() and .join() is slower.

Time with .join():
real    0m2.908s
Time with +=:
real    0m1.742s
The test:
#!/usr/bin/env python

def combine(inc, count):
    text = ""
    for i in xrange(count):
        text += inc
    return len(text)

def combineByJoin(inc, count):
    text = []
    for i in xrange(count):
        text.append(inc)
    text = "".join(text)
    return len(text)

def main():
    inc = "a" * 10
    print combine(inc, 10000000)
    #print combineByJoin(inc, 10000000)

main()
Tested on Python 2.5.2.

2 comments:

rob said...

This is not a fair comparison as you are building up a list before you combine by joining. Just comparing concatenation vs. joining yields different results:

In [20]: text=""
In [21]: a=['aaaaa']*1000000
In [22]: %time for c in a: text+=c
CPU times: user 0.67 s, sys: 0.03 s, total: 0.70 s
Wall time: 0.71 s
In [24]: %time text="".join(a)
CPU times: user 0.10 s, sys: 0.01 s, total: 0.10 s
Wall time: 0.10 s

Ivo said...

Yes you are right. I should not write that join() is slower.
A more correct statement would be:
Using .append() followed by .join() is slower than using += directly.