Tag Archives: Performance

Lessons learned from EuroPython 2016

This was my first EuroPython conference and I had high expectations because I heard a lot of good things about it. I must say that overall it didn’t let me down. I learned several new things and met a lot of new people. So lets dive straight into the most important lessons.

On Tuesday I attended “Effective Python for High-Performance Parallel Computing” training session by Michael McKerns. This was by far my favorite training session and I have learned a lot from it. Before Michael started with code examples and code analysis he emphasized two things:

  1. Do not assume what you hear/read/think. Time it and measure it.
  2. Stupid code is fast! Intelligent code is slow!

At this point I knew that the session is going to be amazing. He gave us a github link (https://github.com/mmckerns/tuthpc) where all examples with profiler results were located. He stressed out that we shouldn’t believe him and that we should test them ourselves (lesson #1).

I strongly suggest to clone his github repo (https://github.com/mmckerns/tuthpc) and test those examples yourself. Here are my quick notes (TL; DR):

  • always compile regular expressions
  • use local variables (true = True, local = GLOBAL)
  • if you know how many elements it will be in your list, create it with None elements and then fill it (L = [None] * N)
  • when inserting item on 0 index in a list use append then reverse (O(n) vs O(1))
  • use built-in functions, use built-in functions, use built-in functions!!! (they are written in C layer)
  • when extending list use .extend() and not +
  • searching in set (hash map) is a lot faster then searching in list (O(1) vs O(n))
  • constructing set is much slower then list so you usually don’t want to transform list into set and then search in it because it will be slower. But again you should test it
  • += doesn’t create new instance of an object so use this in loops
  • list comprehension is better than generator. for loop is better then generator and sometimes also than list comprehension (you should test it!)
  • importing is expensive (e.g. numpy is 0.1 sec)
  • switching between python arrays and numpy arrays is very expensive
  • if you start writing intelligente and complex code you should stop and rethink if there is more stupid way of achieving your goal (see lesson #2)
  • optimize the code you want to run in parallel. This is more important than to just run it in parallel.

Threading and multiprocessing:

  • you should always run analysis if/when threading/multiprocessing is faster. If you are using simple functions it will probably be slower
  • in parallel computing you need to catch and log errors
  • in parallel computing you always want your functions to return value
  • in parallel computing you never want your code to “die”. Always try to return reasonable default value even if an exception is raised. Slightly wrong is better than not getting an answer!
  • when using threading/multiprocessing use .map() and if you don’t care about the order use .imap_unordered(). It is the fastest because it returns the first available value.
  • if you have stop condition use .imap_unordered()
  • be aware of random module problems. Random seed gets copied to all processes. Result is “random doesn’t work”. You need to create random_seed function and ensure that you are in different random state.
  • is there any general rule when to use threads and when multiprocessing? Use threads if you have light jobs (i.e. they execute in 0-1 sec)

Another interesting talk was about code review (Another pair of eyes: Reviewing code well by Adam Dangoor). He pointed out that one of the most important things with the process of reviewing the code is to share knowledge. When you review others code you learn a lot especially if you take your time and try to really understand what he/she was trying to achieve. It is also recommended to always say something nice about the code especially when reviewing the code of junior developer. And when you think that the code you are reviewing has a bug, write a test that proves it.

EuroPython 2016 was really an amazing experience that every Python developer/scientist should experience. I’m really looking forward to EuroPython 2017!

Dexterity vs. Archetypes

TL;DR: migrating your Archetypes content to Dexterity shrinks your Data.fs considerably!

I’ve started looking into migrating Archetypes content in one of the sites we’re running to Dexterity. But before I started coding I wanted to make sure that the juice is worth the squeeze.

The site contains roughly 130k content objects. Most of them are default Plone File type, with a few additional string fields (added with archetypes.schemaextender) and a custom workflow. Nothing fancy, just your standard integration project. New content is imported into the site in batches of ~10k items every now and then with the help of the awesome collective.transmogrifier.

To test how Dexterity compares to Archetypes for our use-case I first created a new Dexterity type that matched the feature-set of the Archetypes type. Then I created a fresh instance, used Transmogrifier to import 13k items (10% of our production data) and ran some numbers.

Results are pretty amazing. With Archetypes, an import of 13k items into a fresh Plone 4.2 site took 61 minutes and the resulting Data.fs had 144 MB (details). With Dexterity, the same import took only 18 minutes and the resulting Data.fs had 64 MB (details). That’s a whopping 70% decrease in import time and a 55% decrease in Data.fs size!

More than enough to be a valid reason to invest in rewriting our types and write that migration script.