Timeout class with retry in Python
In Paramiko, the SSHClient’s connect method has a timeout parameter, but it rarely causes a timeout in some common instances. Since moving from San Diego’s Cox Cable to Round Rock’s Time-Warner, I’ve been seeing stuck connections much more often.
The fix appears to be to use signals: set an alarm before running the line/function that might get stuck, and then remove the alarm afterward. If the alarm has time to go off, it will generate an exception that can then be handled.
Since the inability to connect or fail to connect appears to be extremely random, I decided to combine it with the option to retry the connection:
[toggle code]
- import signal
-
class TimeoutError(Exception):
- pass
- #Paramiko timeout does not work often, if at all
- # So create a timeout class
-
class Timer(object):
-
def __init__(self, function=None, seconds=30, tries=3, errorMessage='Timeout'):
- self.seconds = seconds
- self.tryLimit = tries
- self.tries = 1
- self.function = function
- self.errorMessage = errorMessage
-
def act(self):
- signal.signal(signal.SIGALRM, self.handleTimeout)
- signal.alarm(self.seconds)
- self.function()
- signal.alarm(0)
-
def handleTimeout(self, signum, frame):
-
if self.tries >= self.tryLimit:
- raise TimeoutError(self.errorMessage)
-
else:
- print 'Timed out on try', self.tries, self.errorMessage
- self.tries = self.tries + 1
- self.act()
- print 'Succeeded on try', self.tries
-
if self.tries >= self.tryLimit:
-
def __init__(self, function=None, seconds=30, tries=3, errorMessage='Timeout'):
The first class, a subclass of Exception, doesn’t do anything except give us an appropriately-named exception. The second class will raise that exception if the passed function does not complete before the given number of seconds.
First, instantiate the timer; then run the “act” method. If it times out, it will (by default) try again twice; that is, it will try three times. The function is retried simply by calling the act method again.
For example:
[toggle code]
- from paramiko import SSHClient, SSHException
-
class SFTPClient(Maker):
- …
-
def ensureConnection(self, purpose=None):
-
try:
- timer = Timer(function=self.openConnection, errorMessage=purpose)
- timer.act()
- return True
-
except SSHException, errtext:
- self.warning('Unable to open SSH connection:', errtext, purpose)
-
except socket.error, errtext:
- self.warning('Socket error connecting:', errtext, purpose)
-
except EOFError, errtext:
- self.warning('Server has terminated with EOFError:', errtext, purpose)
-
except TimeoutError, errtext:
- self.warning('Timeout error connecting:', errtext)
- return False
-
try:
-
def openConnection(self):
- self.client = SSHClient()
- self.client.load_system_host_keys()
- self.client.connect(hostname=self.host, username=self.user, timeout=20)
- self.sftp = self.client.open_sftp()
Usually, this code, if it timeouts, will succeed on the second try; only rarely will it fail on all three tries.
Timed out on try 1 uploading stage file string /Stage/Mimsy/Books/no-one-left-lie.html
Timed out on try 2 uploading stage file string /Stage/Mimsy/Books/no-one-left-lie.html
Succeeded on try 3
This could be used with other kinds of errors, but it is especially appropriate for timeouts, because the nature of the timeout is that whatever problem there was thirty seconds ago probably no longer exists.
- October 20, 2014: Retry SSH connections after transient error
-
The Timeout class works great for retrying connections after they timeout, but what about more prosaic errors? I’ve been getting a bunch of AuthenticationException errors in my Python/Paramiko connection attempts lately. I’d been just capturing all SSHExceptions (of which AuthenticationException is a subclass) and reporting the error, but this is just a transient error that almost always goes away on the very next upload.
That makes it a perfect candidate for retrying the connection. I renamed the class from Timeout to Persistence, because this more generic class is going to be more persistent at making connections.1
[toggle code]
- from paramiko import SSHException, AuthenticationException
-
class Persistence(object):
-
def __init__(self, function=None, seconds=30, tries=3, errorMessage='Timeout'):
- self.seconds = seconds
- self.tryLimit = tries
- self.tries = 1
- self.function = function
- self.errorMessage = errorMessage
-
def act(self):
- signal.signal(signal.SIGALRM, self.handleTimeout)
- signal.alarm(self.seconds)
-
try:
- self.function()
-
except AuthenticationException, error:
- self.tryAgain(AuthenticationException(error), 'Authentication exception')
- signal.alarm(0)
-
def tryAgain(self, exception, message):
-
if self.tries >= self.tryLimit:
- raise exception
-
else:
- print message, 'try', self.tries, self.errorMessage
- sleep(2*self.tries)
- self.tries = self.tries + 1
- self.act()
- print 'Succeeded on try', self.tries
-
if self.tries >= self.tryLimit:
-
def handleTimeout(self, signum, frame):
- self.tryAgain(TimeoutError(self.errorMessage), 'Timed out')
-
def __init__(self, function=None, seconds=30, tries=3, errorMessage='Timeout'):
All it really does is add a tryAgain method that can be called both by the handleTimeout method and any exceptions in try/except. If the failure continues more than three times, the exception is passed back up as normal.
More Python
- Quick-and-dirty old-school island script
- Here’s a Python-based island generator using the tables from the Judges Guild Island Book 1.
- Astounding Scripts on Monterey
- Monterey removes Python 2, which means that you’ll need to replace it if you’re still using any Python 2 scripts; there’s also a minor change with Layer Windows and GraphicConverter.
- Goodreads: What books did I read last week and last month?
- I occasionally want to look in Goodreads for what I read last month or last week, and that currently means sorting by date read and counting down to the beginning and end of the period in question. This Python script will do that search on an exported Goodreads csv file.
- Test classes and objects in python
- One of the advantages of object-oriented programming is that objects can masquerade as each other.
- Percentage-based random tables
- Our current random item generator assumes that each item shows up as often as any other item. That’s very OD&D-ish. But AD&D uses percentage dice to weight toward some monsters and items more than others.
- 30 more pages with the topic Python, and other related pages
More SSH
- Retry SSH connections after transient error
- The timeout class set up the concept of retrying after a specified timeout period, but why not retry on other transient errors?