During the process of migrating my Python 2.7 scripts to 3.7, there are blocks I stumble on such as Unicode and type float to string issues. I figure it’s time worthy to read and learn systematically of new features via the official Python org:
First, print is a function now. This seems easy at surface level. Note
Old: print x, # Trailing comma suppresses newline
New: print(x, end=” “) # Appends a space instead of a newline
print(“There are possibilities!”, sep=””)
#output There are possibilities!
Then, Views And Iterators Instead Of Lists. dict methods dict.keys(), dict.items() and dict.values() return “views” instead of lists. For example, this no longer works: k = d.keys(); k.sort(). Use k = sorted(d) instead (this works in Python 2.5 too and is just as efficient). I don’t use map and filter often, but map() and filter() return iterators. If you really need a list and the input sequences are all of equal length, a quick fix is to wrap map() in list(), e.g. list(map(…)), but a better fix is often to use a list comprehension (especially when the original code uses lambda), or rewriting the code so it doesn’t need a list at all. More importantly to know,
- range() now behaves like xrange() used to behave, except it works with values of arbitrary size. The latter no longer exists.
- zip() now returns an iterator
The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < ”, 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False
And there are some tricky things about integer, float etc. The repr() of a long integer doesn’t include the trailing L anymore, so code that unconditionally strips that character will chop off the last digit instead. (Use str() instead.)
It’s more attention-grabbing on “Text Vs. Data Instead Of Unicode Vs. 8-bit”, everything you thought you knew about binary data and Unicode has changed. In the following I scribble some of the knowledge points:
Concept of text and data instead of Unicode strings and 9-bit strings. All text is unicode, however encoded unicode is represented as binary data. The biggest difference with the 2.x situation is that any attempt to mix text and data in Python 3.0 raises TypeError. So what we should do, use unicode for all unencoded text, and str for binary or encoded data only; As the str and bytes types cannot be mixed, you must always explicitly convert between them. Use str.encode() to go from str to bytes, and bytes.decode() to go from bytes to str. You can also use bytes(s, encoding=…) and str(b, encoding=…), respectively.
Files opened as text files (still the default mode for open()) always use an encoding to map between strings (in memory) and bytes (on disk).
!= now returns the opposite of ==, unless == returns NotImplemented. I previously use <>, which has to be replaced by != now.
I used to use pd_read_csv without issues, but since the text, str revision mentioned above, alternative ways developed. 1. csv module can be used
lines = 
with open(dirpath2, newline=”) as csvfile:
jwcreader = csv.reader(csvfile, delimiter=’,’, quotechar=’|’)
for row in jwcreader:
with open(dirpath2, ‘rb’) as f:
result = chardet.detect(f.read()) # or readline if the file is large
exchange_country = pd.read_csv(dirpath2, encoding=result[‘encoding’])
Setting cell format in Excel then read in encountered hiccups too.
Excel office has this setting – automatically convert a long numerical form to a scientific expression, which often time messes up read in functionality in Python. Adding astrophe in the front of the numeric value can do the trick, however, we can modify the format of cell by going to”custom” and then choose value “0”, works out like a charm.