1.6.5.2 Putting it all together

Now lets put it all together. We’ve made a couple of other changes to our code to define a variable at the very top called numGames = 10000 to define the size of our range.

# Simulates 10K game of Hi Ho! Cherry-O 
# Setup _very_ simple timing. 
import time
import os
import sys
start_time = time.time() 
import multiprocessing 
from statistics import mean 
import random 
numGames = 10000 

def cherryO(game): 
    spinnerChoices = [-1, -2, -3, -4, 2, 2, 10] 
    turns = 0 
    cherriesOnTree = 10 

    # Take a turn as long as you have more than 0 cherries 
    
    while cherriesOnTree > 0: 
        # Spin the spinner 
        spinIndex = random.randrange(0, 7) 
        spinResult = spinnerChoices[spinIndex] 
        # Print the spin result     
        #print ("You spun " + str(spinResult) + ".") 
        # Add or remove cherries based on the result 
        cherriesOnTree += spinResult 
        # Make sure the number of cherries is between 0 and 10    
        if cherriesOnTree > 10: 
            cherriesOnTree = 10 
        elif cherriesOnTree < 0: 
            cherriesOnTree = 0 
        # Print the number of cherries on the tree        
        #print ("You have " + str(cherriesOnTree) + " cherries on your tree.")
        turns += 1 
    # return the number of turns it took to win the game 
    return(turns) 

def mp_handler():
    # Set the python exe. Make sure the pythonw.exe is used for running processes, even when this is run as a
    # script tool, or it may launch n number of Pro applications.
    multiprocessing.set_executable(os.path.join(sys.exec_prefix, 'pythonw.exe'))
    arcpy.AddMessage(f"Using {os.path.join(sys.exec_prefix, 'pythonw.exe')}")
    
    with multiprocessing.Pool(multiprocessing.cpu_count()) as myPool:
       ## The Map function part of the MapReduce is on the right of the = and the Reduce part on the left where we are aggregating the results to a list. 
       turns = myPool.map(cherryO,range(numGames)) 
    # Uncomment this line to print out the list of total turns (but note this will slow down your code's execution) 
    #print(turns) 
    # Use the statistics library function mean() to calculate the mean of turns 
    print(mean(turns)) 

if __name__ == '__main__': 
    mp_handler() 
    # Output how long the process took. 
    print ("--- %s seconds ---" % (time.time() - start_time)) 

You will also see that we have the list of results returned on the left side of the = before our map function (line 40). We’re taking all of the returned results and putting them into a list called turns (feel free to add a print or type statement here to check that it's a list). Once all of the workers have finished playing the games, we will use the Python library statistics function mean, which we imported at the very top of our code (right after multiprocessing) to calculate the mean of our list in variable turns. The call to mean() will act as our reduce as it takes our list and returns the single value that we're really interested in.

When you have finished writing the code in PyScripter, you can run it. 

We will use the environment's Python Command prompt to run this script. There are two quick ways to start a command window in your environment:

In your Windows start menu:

search for "Python Command Prompt" and it should result in a "Best match". After opening, be sure to verify that it opened the environment you want to work in (details below).

Or, you can navigate to it by clicking the "All" to switch to the application list view.

Scroll down the list and expand the ArcGIS folder to list all ArcGIS applications installed.

Scroll down and open the Python Command Prompt.

This is a shortcut to open a command window in the activated python environment. Once opened, you should see the environment name in parentheses followed by the full path to the python environment.

We could dedicate an entire class to operating system commands that you can use in the command window but Microsoft has a good resource at this Windows Commands page for those who are interested.

We just need a couple of the commands listed there :

  • cd : change directory. We use this to move around our folders. Full help at this Commands/cd page.
  • dir : list the files and folders in my directory. Full help at this Commands/dir page

We’ll change the directory to where we saved the code from above (e.g. mine is in c:\489\lesson1) with the following command:

cd c:\489\lesson1

Before you run the code for the first time, we suggest you change the number of games to a much smaller number (e.g. 5 or 10) just to check everything is working fine so you don’t spawn 10,000 Python instances that you need to kill off. In the event that something does go horribly wrong with your multiprocessing code, see the information about the Windows taskkill command below. To now run the Cherry-O script (which we saved under the name cherry-o.py) in the command window, we use the command:

python cherry-o.py

You should now get the output from the different print statements, in particular the average number of turns and the time it took to run the script. If everything went ok, set the number of games back to 10000 and run the script again.

It is useful to know that there is a Windows command that can kill off all of your Python processes quickly and easily. Imagine having to open Task Manager and manually kill them off, answer a prompt and then move to the next one! The easiest way to access the command is by pressing your Windows key, typing taskkill /im python.exe and hitting Enter, which will kill off every task called python.exe. It’s important to only use this when absolutely necessary, as it will usually also stop your IDE from running and any other Python processes that are legitimately running in the background. The full help for taskkill is at the Microsoft Windows IT Pro Center taskkill page.

Look closely at the images below, which show a four processor PC running the sequential and multiprocessing versions of the Cherry-O code. In the sequential version, you’ll see that the CPU usage is relatively low (around 50%) and there are two instances of Python running (one for the code and (at least) one for PyScripter).

In the multiprocessing version, the code was run from the command line instead (which is why it’s sitting within a Windows Command Processor task) and you can see the CPU usage is pegged at 100% as all of the processors are working as hard as they can and there are five instances of Python running.

This might seem odd as there are only four processors, so what is that extra instance doing? Four of the Python instances, the ones all working hard, are the workers, the fifth one that isn’t working hard is the master process which launched the workers – it is waiting for the results to come back from the workers. There isn’t another Python instance for PyScripter because I ran the code from the command prompt – therefore, PyScripter wasn’t running. We'll cover running code from the command prompt in the Profiling section.

screenshot in task manager oof sequential code
Figure 1.11 Cherry-O sequential code Task Manager Tasks
screenshot of task manager performance CPU workload (4 graphs)
Figure 1.12 Cherry-O sequential code Task Manager workload
screenshot of manager multiprocessing tasks
Figure 1.13 Cherry-O multiprocessing Task Manager Tasks
screenshot of task manager performance CPU (4 graphs no data in them)
Figure 1.14 Cherry-O multiprocessing Task Manager workload

On this four processor PC, this code runs in about 1 second and returns an answer of between 15 and 16. That is about three times slower than my sequential version which ran in 1/3 of a second. If instead I play 1M games instead of 10K games, the parallel version takes 20 seconds on average and my sequential version takes on average 52 seconds. If I run the game 100M times, the parallel version takes around 1,600 seconds (26 minutes) while the sequential version takes 2,646 seconds (44 minutes). The more games I play, the better the performance of the parallel version. Those results aren’t as fast as you might expect with 4 processors in the multiprocessor version but it is still around half the time taken. When we look at profiling our code a bit later in this lesson, we’ll examine why this code isn’t running 4x faster.

When moving the code to a much more powerful PC with 32 processors, there is a much more significant performance improvement. The parallel version plays 100M games in 273 seconds (< 5 minutes) while the sequential version takes 3136 seconds (52 minutes) which is about 11 times slower. Below you can see what the task manager looks like for the 32 core PC in sequential and multiprocessing mode. In sequential mode, only one of the processors is working hard – in the middle of the third row – while the others are either idle or doing the occasional, unrelated background task. It is a different story for the multiprocessor mode where the cores are all running at 100%. The spike you can see from 0 is when the code was started.

screenshot of task manager performance, CPU, 32 graphs
Figure 1.15 Cherry-O Seq_Server
screenshot of task manager performance, CPU, 32 graphs w/ sharp slopes
Figure 1.16 Cherry-O MP_Server

Let's examine some of the reasons for these speed differences. The 4-processor PC’s CPU runs at 3GHz while the 32-processor PC runs at 2.4GHz; the extra cycles that the 4-processor CPU can perform per second make it a little quicker at math. The reason the multiprocessor code runs much faster on the 32-processor PC than the 4-processor PC is straightforward enough –- there are 8 times as many processors (although it isn’t 8 times faster – but it is close at 6.4x (32 min / 5 min)). So while each individual processor is a little slower on the larger PC, because there are so many more, it catches up (but not quite to 8x faster due to each processor being a little slower).

Memory quantity isn’t really an issue here as the numbers being calculated are very small, but if we were doing bigger operations, the 4-processor PC with just 8GB of RAM would be slower than the 32-processor PC with 128GB. The memory in the 32-processor PC is also faster at 2.13 GHz versus 1.6GHz in the 4-processor PC.

So the takeaway message here is if you have a lot of tasks that are largely the same but independent of each other, you can save a significant amount of time utilizing all of the resources within your PC with the help of multiprocessing. The more powerful the PC, the more time that can potentially be saved. However, the caveat is that as already noted multiprocessing is generally only faster for CPU-bound processes, not I/O-bound ones.