Moving Big Projects to Python 3

Lennart Regebro

EuroPython, 2019

My name is Lennart, and I'm born in Sweden, but I live in Poland, with my wife, daughter, cats and fruit trees.

I have been using Python since Python 1.5.2, and I have been working with Python and web since 2001.

python3porting.com

And I wrote the book on how to move from Python 2 to Python 3. You can find it in both HTML and PDF on python3porting.com. It's open source, the source is on github.

BriteCore is the leading technology platform for modern insurance providers. Fully managed through Amazon Web Services cloud, BriteCore is continually updated to guarantee maximum security, efficiency, and durability at scale. Over 45 Carriers, MGAs, and InsureTechs rely on BriteCore for their core, data, and digital needs.

I work for BriteCore.

We do the type of software that insurance companies use to deal with insurance policies and claims.

We work remotely, and yes, we are hiring. If you are looking for a job and want to work remotely, talk to me! I'm new to the whole recruiting bit, but talk to me any way.

We are not running on Python 3 yet, it's still an ongoing effort.

But I should also shout out to my previous job Shoobx, where we successfully moved a large and insanely complex system to Python 3 last year.

So, let's go back back to the stoneage.

This is you back in the stone age, and this is your framework.

You or your company created some Python application, and you did such a good job that it's still running!

It's probably a webapp and you are probably running it on some old version of maybe Web2py maybe Turbogears, maybe even Zope!

And you have been bravely running away from Python 3 for years.

But you can't run any longer. Time to face the monster. Because next year Python 2 commits suicide.

But don't fear Python 3, it's not The Killer Rabbit of Caerbannog, it's just a regular old Python.

Embrace the Serpent

The hard part of porting is getting your old system into a state where it's easy to port. The porting itself is quite easy.

Step 1: Stop being a fire department!

The first thing you need to do is stop being a fire department.

Many large organizations are constantly putting out fires. That's not a good situation to port to Python 3, because if the changes you do as a part of normal development breaks production, and you need to put out that fire, then moving to Python 3 is going to start new fires. Also, all your developers will be too busy putting out fires to port things.

So the first thing you need to do is to get out of firefighting mode. And that in itself is a whole talk, and I'm not the one to do that talk anyway. I'll quickly mention a few things I've seen DevOps do to fix this.

Increase test coverage

Continuos integration

Use staging servers

Automatic deployment

Monitoring

This slide assumes that your software is a service of some sort, with a production environment and devops, because that's all the fire fighting I know.

You HAVE to have tests to move to Python 3, but tests also help with stability. And you have to run those tests, and that means that for any sizeable project you must have CI. I'll talk more about that later.

Switching to Python 3 without a staging setup would also be insane.

Automatic deployment is helpful. Deployment of a new release of the software should just be a push of the button. Extra points if master is released and pushed to staging every night, so you know that your deployment is working.

Monitoring is good, you want to know that there is a problem before your users know it.

Isolated production environment

Definitely Virtualenv

Maybe containers

There are some Python specific things we can do to harden production. One thing is to run in some sort of isolated environment. This typically means a virtualenv, or buildout.

Containers are in now, that helps isolating, so you don't get weird interactions with new versions of OS packages. For example docker.

Docker

This is probably going to be obvious to most of you, but I've just realized this the last few months, so I'll mention it because it's new to me!

If you use docker on production, every time your package requirements change you must build a new docker image, because that image includes the virtual environment for the servers. So if some new requirement change creates conflicts, you don't notice that during deployment, but while building the docker images! Yay, deployment didn't mess up production, it stopped before production was even touched!

In addition, you can then use those images on CI, and even to develop on, so you know that developers run the same environment as production!

So with all these thing in place...

... your firefighters now can take it easy.

Stage 2: Preparing

When the firefighting is done, it's time to prepare.

Pin all versions

To make sure that you know what you install, you should pin all versions of all packages.

beautifulsoup4==4.6.0
amqp==1.4.7
boto==2.38.0
boto3==1.6.23
botocore==1.9.23
datadog==0.12.0
placebo==0.8.1
awscli==1.10.10
colorama==0.2.4
coverage==3.5.2
dropbox==7.2.1
epydoc==3.0.1
ezodf2==0.2.9

pip unfortunately has no flag I can find to require this. One way to do this would be to verify in the install script that what you installed matches the requirements file, by f ex comparing your pip freeze output with the requirements file. That way, you would get an error if you change one package that introduces new dependencies.

beautifulsoup4==4.6.0 --hash=sha256:7015e76bf32f1f574636c4288399a6de66
amqp==1.4.7 --hash=sha256:d5ab52b96c11a93324e4f8e5f35fac8a58e69eba0555
boto==2.38.0 --hash=sha256:758498c77f16e26b8b61af26f8d0dbb119b713f2d8d
boto3==1.6.23 --hash=sha256:f270f058f46aff9208fe29cffee79a46a7020cd186
botocore==1.9.23 --hash=sha256:fa84972784e55ae5f89c2d33b1b10b6ba028cb3
datadog==0.12.0 --hash=sha256:453facddea3bdce14bfe6518d65db8ec8f1b3309
placebo==0.8.1 --hash=sha256:6c15ce4a6be158603550fa211f7ce7cc0327c5e45
awscli==1.10.10 --hash=sha256:f8d53c0e3211353a6d4eefb82432996eff535ad4
colorama==0.2.4 --hash=sha256:f1a742e49fbb0838ab478339597580d8a5869f9f
coverage==3.5.2 --hash=sha256:22f8828c83958efc343666926362eb4a6cb38610
dropbox==7.2.1 --hash=sha256:34fe06b735358f4454d59e2ea3dadff63fa03e076
epydoc==3.0.1 --hash=sha256:87f191ef04783f8871ce085f8527c8519a74e401c7
ezodf2==0.2.9 --hash=sha256:bda2added62f3c3c02c9385fcaaf7a2beed3f3b42a

What you can do with pip is to add hashes to the requirements, if you add one hash, it will require hashes for all packages, effectively making sure no new requirement goes unpinned.

This makes for huge requirements files with loads of hashes in them. But it also adds extra security.

Increase test coverage (again)

But it is very good to cover a line, because lines that aren't covered may contain hidden Python 2 code.

What percentage of test coverage you want is really a matter of opinion. 100% is awesome, but is likely practically unobtainable. 90-95% would be my target. You can bridge the gap somewhat by carefully reading all non-covered lines and looking for Python 2 syntax on the non-covered lines, at some point that becomes easier than writing a test.

Mock gotchas

There is this philosophy in mocking that you should test each function separately and that all calls from that function should be mocked out.

But then you only test that the function did what you told it to do. You don't test that it WORKS, and if the API you call changes, then the test will still pass. This is obviously a problem with Python 3, because you are effectively mocking in python 2 calls and behavior. So this type of testing is useless when porting to Python 3. If you do this, you need to have 95% coverage from your integration test.

Upgrade dependencies

Upgrade all packages

Replace or port anything that isn't Python 3 compatible

Make sure you have the latest Python 2 compatible version of all your dependencies. Then make sure all your dependencies are Python 3 compatible. You may have to replace, or worst case, port, some of your dependencies at this point.

This stage can take a significant time, especially if you have not been keeping your dependencies up to date.

Stage 3: Planning

What, planning after preparing? Well, you can plan before preparing as well, it's OK. Which order you do this in is optional.

But planning is a lot about if you let everyone do the porting work, or if you only have a small team, and the preparing in step 2 is nothing you can have 15 people working on in any case, so you only need to plan at this stage. You can do it earlier, of course.

When planning, I have three questions for you.

1. Can you stop adding features?

It depends very much on your business if you can take a time out from adding features to do the porting or not. But it still will take a few weeks at least. Maybe longer. So can you stop adding features and stop firefighting that long?

2. Do you have magic?

And if some parts of your code is doing deep magic, it can be very hard to port. And then the few of your Python gods that actually understand that code, will be busy with that, when everything else already works. Or, it's so deeply integrated in the code that nobody can actually port their bits until that deep magic is fixed. In both of those cases, everyone that are supposed to port to Python 3 will be blocked.

3. How big is your team?

The famous mythical man-month remains mythical also with Python 3. Putting 50 developers on porting at the same time will not work. They will end up being blocked by each other, and you can't distribute the work properly. Ten isn't a problem, you can synchronize that, at least if they are in the same office. Maybe even 20, but no more than that. If your system is already split into multiple separate services that run separately, then you can probably put each team on porting their bit separately, so then you are already ahead of the game, but most of these big systems are monoliths.

Strategy: One big push!

You don't have deep magic. You can stop adding features.

Then you can do it all in one go.

One big push: Benefits

Takes less time

Less work in total

You can aim directly for Python 3 code

One big push: Drawbacks

High risk

All other work stops

So I don't recommend doing this. If you feel you can move your project to Python 3 in one go, then you would likely have done so already.

Strategy: Slow and steady

So porting big projects to Python 3 is usually done slowly and carefully. You will port the code to code that runs on both Python 3 and Python 2, even though you run it on Python 2. And then, one day, you can finally switch and run it on Python 3.

Slow and steady: Benefits

Low risk

Doesn't disrupt normal operations

Slow and steady: Drawbacks

More work

Longer total time

You need dual version support

Strategy: Mix it up!

If you have a development team small enough to fit into one big country house, you can start with a Python 3 sprint for all the developers, but not aim for Python 3, but aim for a Python 2/3 compatible code. That way, when they come back half done, you can switch to have a dedicated team do the last bit, or just have people do it when there is no critical work.

This is what we did at Shoobx.

Mix: Benefits

Low risk

Only disrupts normal operation briefly

Everyone gets onboard and feels involved

Mix: Drawbacks

You need dual version support

Still slow

But all in all I think this is a good option for smaller teams. Now, when in the following process you want to make the quick push is up to you. At Shoobx we did it more or less at what I call stage 4, ie the Preparing was done before.

Stage 4: Porting

Setup your testing for Python 3

It's now time to start running your tests under Python 3, and this will obviously always fail. But that's OK. What you want to do here is to prevent people from adding more Python 3 incompatible code while others are trying to add Python 3 compatibility.

Because if you allow that you will never finish. And the trick to stopping this is in CI.

Call in the CI Gurus

You need to let your CI system keep track of which tests that once DID pass under Python 3, and if a test that should pass no longer passes under Python 3, flag the test run as failed. But you can't require ALL tests to pass under Python 3 initially, because then all your builds will fail and you can never merge anything.

Modernize

Backwards compatible 2to3 fixers

Six

The compatibility layer

I'm sure you know what 2to3 is, it's a tool that will refactor your code from Python 2 to Python 3 code. It doesn't do everything, but it's helpful. Modernize is an extension that generate Python 2 compatible code, mostly by using six, which is the compatibility library between Python2 and Python 3.

There's also another compatibility layer called python-future which has it's own extension called futurize. Python-future inserts a lot of magic to make the code compatible in both Python 2 and Python 3, and that magic has bitten me several times, so my recommendation is to avoid python-future.

Your first errors will be syntax and import errors. That's because some module with have a syntax error, and the modules trying to import from that module will have some sort of syntax error. So the first thing you want to do is fix those syntax errors.

One fixer at a time

If you run modernize on any larger code base, the system won't still work after that, and with the massive changes you get, it can be hard to figure out what went wrong. It's better to do it carefully.

Therefore, you should run one fixer at a time. Find the cause of the error you have, and if there is a fixer for that, run that fixer. Maybe even on just that one file. Big changes get confusing when they fail.

But manually fixing each single import or syntax error is boring. So use the modernize fixers to help alleviate the boredom.

Fix fix fix

This is where the book finally is useful. Because it's about this part, and this part only.

The only difficult bits here is if you have deep magic, as mentioned before.

As you get more confident, and more things work, you can run fixers on more files in one go, but if you are many people doing the porting you risk merge conflicts.

Don't forget the your dev environment

Do you have scripts to set up a development environment? Or are you using docker? Something else? In any case, that environment should be able to be built under Python 2 or Python 3 or both. Any build scripts you have need to support both versions, any auxilary scripts you have also need to do that.

At BriteCore we for example have scripts that help you copy test databases, set up docker images etc. It's usually easier if they support Python 2 and Python 3 first, so you don't have to keep two environments going. Sometimes the dev help scripts already run in a separate virtualenv, and then you might be able to port them later. But then again, if they are separate, you might want to do them first as practice!

Write data migration tests

Do you get text strings when you expect text strings?

Are non-ascii chéracters interpreted corröctly?

Are you loading data from disk at some point?

Are you using pickles? ARE YOU?

You should take data that is created with the software running on Python 2, and write tests to make sure you get the right data in Python 3.

Push to staging

Test it carefully, manually, with real data

If all tests pass, or maybe even before all tests pass, try it on staging.

Stage 5: Push to production

Be prepared to fall back if possible!

If you have the possibility to move customers one by one, do that. Start small, work yourself up.

If you have to migrate the database, you may not be able to go back to Python 2, so in that case you need to be extra careful.

Celebrate!

Clean up

And then clean up. Actually, that picture is a misrepresentation.

Stage 6: Clean the code

Because this is the fun bit. This is where you can go through the code and remove loads of old cruft. See it as an opportunity to just prettify the code.

Summary

Stop firefighting

Prepare + Plan

Fixing tests under Python 3

Push to staging

Push to production

Clean up the code

Questions?

britecore.com/careers

SpaceForward
Right, Down, Page DownNext slide
Left, Up, Page UpPrevious slide
GGo to slide number
POpen presenter console
HToggle this help