Supporting Py2 & Py3 Part 1

Intro…

With the arrival of Python3 in 3Ds Max 2021/2022, Maya 2022 and MotionBuilder 2022 – there is going to be a whole lot of Tech Art/Anim/Pipe codebases that need updating to either support both versions or completely transition across to Python3.

But Shea, why not just stick with Py27? – you might ask – well I will tell you, you inquisitive little thing, you!

That is the official spiel.
It was something that I was roughly aware of but didn’t really think would factor into my day job, as my day job was working predominantly with Autodesk software and Autodesk follows https://vfxplatform.com/, which means Py27 was the version all DCC’s were locked to for the foreseeable future.
However, last year 3DsMax 2021 brought Py37 and I went through the process of converting a large codebase to support both Py27 and Py37 during an update from 3DsMax 2018 to 3DsMax 2021.
On the whole it wasn’t that bad, but there are a bunch of gotchas that were tough to figure out along the way, so I share here what I learned during the process.

The requirements were:

  • Large codebase, written in Py27
  • About 50% was DCC agnostic, used in 3DsMax, Maya, MotionBuilder and standalone
  • Needed to have the codebase support both Py27 and Py37 as Maya and MotionBuilder had no Py37 support yet
  • Once all the other DCC’s caught up, the codebase would become Py37 exclusive

What I dun gone learned

strings, bytes & unicode

Strings in Py2 and Py3 are completely different and this will likely be the main cause of headaches for you, as it sure was for me.

type testing

The predominant issue was type testing strings using isinstance.
I had opted to use the unicode_literals import from future, as the intention was for the codebase to be 100% Py37+ eventually, rather than continuously supporting both Py2 and Py3.
This added an extra layer of confusion around the handling of strings, but thankfully it was somewhat easily resolved.

In Py2, the common catch-all for type testing the various string classes in Py2 is to use basestring to test against.
This won’t work in Py3 due to basestring no longer existing.

my_string = "my string"

# works in Py2 and Py3:
isinstance(my_string, str)
# Output: 
>> True 

# only works in Py2, as basestring no longer exists in Py3:
isinstance(my_string, basestring)
# Output: 
>> NameError: name "basestring" not defined

# only works in Py2, as str is now unicode and the unicode class no longer exists in Py3:
isinstance(my_string, unicode)
# Output: 
>> NameError: name "unicode" not defined

Python

The much more painful issue occurred when a str was passed in from a module that doesn’t have the unicode_literals import and type tested in a module that does. This result in silent bugs as the non unicode_literals str would fail the test and trigger the incorrect branch of code:

# ModuleA.py:
my_string = "my string"

# ModuleB.py:
from future import unicode_literals

import ModuleA

var_to_test = ModuleA.my_string
if isinstance(ModuleA.my_string, str):
    # do a thing because var_to_test is a string
else:
    # do a separate thing because var_to_test is NOT a string

Python

In the above example, the else branch is triggered in Py2 because the type of str in ModuleB has been modified to be unicode_literal, which is not the same as the built in str in ModuleA. Sneaky little bastard!


To ensure your string type testing is easily compatible with both Py2 and Py3 and whether or not you use the unicode_literals import, use six.string_types to test against:

from future import unicode_literals
from six import string_types

my_string = "my string"

# works in Py2 and Py3, with or without unicode literals:
isinstance(my_string, string_types)

# Output: 
>> True 

Python

explicitly declared binary strings

Some modules, such as PySide, require you to occasionally provide bytes objects as arguments.

The example I use here is for QPropertyAnimation:

https://doc.qt.io/qt-5/qpropertyanimation.html

In Py2, instantiating a QPropertyAnimation could be done with a standard string ( as str == bytes ):

from PySide2 import QtCore

pa = QtCore.QPropertyAnimation("minimumHeight")

# Output:
>> <<class 'PySide2.QtCore.QPropertyAnimation'>

Python

In Py3, the string needs to be prefaced with a lower case ‘b’ or a pre-instantiated bytes object:

from PySide2 import QtCore

pa = QtCore.QPropertyAnimation(b"minimumHeight")

# Output:
>> <<class 'PySide2.QtCore.QPropertyAnimation'>

Python

encoding

Py2 handles string encoding pretty much automatically. This makes life easy for beginners but caused problems once more advanced usage is required.

Py3 requires the type of encoding to be explicitly specified, rather than the internals intuiting the correct encoding type.

Here is a really good article on encoding in Py3:

https://realpython.com/python-encodings-guide/

In Py2 you were able to convert any string like object to a bytes object without providing any encoding arguments.

my_string = "my string"
my_string_bytes = bytes(my_string)

# Technically this proves nothing as in Py2 str == bytes, but
# in Py3, it raises this exception:
>> TypeError: string argument without an encoding

Python

To provide correct conversion to a bytes object in Py3, you need to provide a second argument with a valid encoding type:

my_string_bytes = bytes(my_string, "utf-8")

# Output:
>> b'my string'
# The 'b' denotes the string is a bytes object.

Python

binary files

In short, If you save as binary, then load as binary!

Py2 is forgiving about mixing binary and non-binary saving and loading:

# If you save to file using "wb" in Py2:
with open(<file_path>, "wb") as f:
    f.write("This is saved as binary, so it should be loaded as binary")

# Py2 will be able to determine the format in the internals of open correctly:
with open(<file_path>, "r") as f:
    lines = f.readlines()

Python

Py3 however, will complain – so load as binary if you save as binary:

# But in Py3, If you save to file using "wb":
with open(<file_path>, "wb") as f:
    f.write("This is saved as binary, so it should be loaded as binary")

# Load the file using "rb" ( or "wb" ):
with open(<file_path>, "rb") as f:
    lines = f.readlines()

Python

All of the above, relatively simple issues tie into the next and biggest problem area I faced:

pickle

The pickle implementation in Py3 is the c implementation from Py2 ( cpickle ).
This means cpickle no longer exists in Py3. Other than this difference, the implementation is also updated with extra protocols and encoding gotchas, lets go through them:

import

This is one of the imports with no helper in futures and the six helper is a bit unhelpful especially if the intention is to move to Py3, so I opted to manage it with a version check. This can be achieved in one of two ways:

Using six:

from six import.PY3 as ISPY3

if ISPY3:
    import pickle
else:
    import cpickle as pickle

Python

Or checking the version number manually:

import sys

ISPY3 = False
if sys.version_info >= (3, 0):
    ISPY3 = True
    import pickle
else:
    import cpickle as pickle

Python

Obviously, the six method is the more elegant choice – but variety is the spice of life…

numpy and encodings

If you have pickled datetime, date or numpy data in Py2, you will have to unpickle it in Py3 using the “latin1” encoding.

from six import.PY3 as ISPY3

if ISPY3:
    import pickle
else:
    import cpickle as pickle

pickle.load(<pickled_numpy_file_path>, encoding="latin1")

Python

Other encodings to note:

  • “ASCII” – the default encoding type. 
  • “bytes” - read these 8-bit string instances as bytes objects 
  • “latin1” - is required when unpickling numpy arrays and instances of datetime, date and time pickled by Py2. 

protocols

If you have pickle data you intend to load in both Py2 and Py3, then make sure you set the protocol argument when pickling in Py3 to 2 or lower. This is because protocol 2 is the highest level implemented in Py2 pickle.

An integer, the default protocol version used for pickling. May be less than HIGHEST_PROTOCOL. Currently the default protocol is 4, first introduced in Python 3.4 and incompatible with previous versions.

https://docs.python.org/3/library/pickle.html#pickle.DEFAULT_PROTOCOL

The optional protocol argument, an integer, tells the pickler to use the given protocol; supported protocols are 0 to HIGHEST_PROTOCOL. If not specified, the default is DEFAULT_PROTOCOL. If a negative number is specified, HIGHEST_PROTOCOL is selected.

https://docs.python.org/3/library/pickle.html#pickle.Pickler

from six import.PY3 as ISPY3

if ISPY3:
    import pickle
else:
    import cpickle as pickle

pickle.dump(<data>, <file_path>, protocol=2)

Python

perforce

Using perforce adds an extra layer of complexity.
If you have saved Py2 pickle files as binary format and then checked them into perforce without explicitly setting the file type in perforce to binary, you will encounter a number of side-effects, ranging from file not loading to data corruption when unpickling in Py3.

This is because Perforce changes the encoding of the file to text when it stores it on the server if it’s file extension is not specified as binary. This breaks the line endings and in some cases other more vital bits of the encoded data.

I was able to repair the files in some instances by cleaning up the line endings when loading in Py3 using this approach:

https://stackoverflow.com/questions/8527241/python-pickle-fix-r-characters-before-loading
However this is not ideal as it can cause further problems when going back into Py2.

In short: make sure you save your pickle data as binary and your pickle file extension is set to a binary file type in Perforce ( or which ever version control you use ).

If you were not aware of this before saving a bunch of pickle data as binary and checking it into perforce as a text file ( like someone I know who’s name rhymes with O’Shea *cough – cough* ), then here are some extra gotchas for trying to clean up the files:

Changing the file type of the extension once it is already in Perforce will not change the previous revisions’ file types, only the revisions from the point in the history of when the file type changed.

If you have already checked in a pickle binary file as a text file and then realise there is a pickle binary file extension registered on the server ( thanks to some helpful person who is smarter than you ), then just changing the incorrect extension to the correct, pickle binary extension will not work. The file type has to be explicitly changed for Perforce to update correctly too!


This is done in one of two ways:

p4 move -t binary fileName.textpkl fileName.binarypkl

Batch

  • Copy the old file, changing the extension to the correct binary one. Delete the original file with the incorrect extension from Perforce. Finally add the duplicate yet correctly extensioned file to Perforce. Perforce will then automatically recognise the extension and associate it with the correct file type.
    • This will not retain revision history.
copy /Y fileName.textpkl fileName.binarypkl
p4 del fileName.textpkl
p4 add fileName.binarypkl

Batch

Here is a great reference for other Py2 vs Py3 pickle problems:

https://rebeccabilbro.github.io/convert-py2-pickles-to-py3/

That’s it for now – so all I will say is welcome to the Python3 party, we in the VFX and Games Industries are only 15 years late…

In part two I will go over some other issues ( like the zipped 2to3 doozy ) and some tooling I wrote to make the process easier as it can be quite tedious to do this all from the command line…

4 thoughts on “Supporting Py2 & Py3 Part 1

Leave a comment