Iterating filehandles in python, and buffering

eng pdo sw tips

Given this simple python script:

#!/usr/bin/python
from subprocess import *
proc = Popen(["/bin/sh", "-c", "for i in `seq 1 10`; do echo $i; sleep 1; done"], stdout=PIPE, bufsize=1)
a = proc.stdout.__iter__()
print a.next()

it is interesting to notice that a.next() takes 10 seconds to run. I tried with bufsize=1 (line buffered) and bufsize=0 (no buffering).

I tried:

sh -c 'for i in `seq 1 10`; do echo $i; sleep 1; done' | cat
sh -c 'for i in `seq 1 10`; do echo $i; sleep 1; done' | less
sh -c 'for i in `seq 1 10`; do echo $i; sleep 1; done' > /tmp/test

in all those three cases you see the output one line at a time, so it looks like python is doing the buffering.

Using readline() instead of the generator works:

#!/usr/bin/python
from subprocess import *
proc = Popen(["/bin/sh", "-c", "for i in `seq 1 10`; do echo $i; sleep 1; done"], stdout=PIPE, bufsize=1)
print proc.stdout.readline()

Therefore python is doing funny buffering when using __iter__() on a file object. Good to know.

Here is how to get an iterator that does not introduce unwanted buffering:

def iterlines(fd):
    while True:
        line = fd.readline()
        if len(line) == 0: break
        yield line