ESI, Concurrent Programming, and Pagination
This blog post is part of a series of blogs examining best practices for ESI development. Each blog will be published on the 8th of each month during the journey towards XML API and CREST’s termination date. The legacy APIs will be terminated on May 8th, 2018, or earlier if metrics signal a trivial level of usage.
This blog will cover the concept of concurrent programming and how to use this model of programming to improve your software's performance while pulling data from ESI. If you are already familiar with the concept of concurrent programming here is a ESI pagination TLDR: Use the X-Pages
header returned from paginated ESI endpoints to determine how many calls are needed to get all data from a given endpoint.
Prerequisites
This blog includes examples of Linux commands as well as Python code, therefore if a reader wants to follow this blog exactly then they will have to do so on a Linux based system and use Python 3. It is also assumed that the user has both pip and virtualenv installed.
Have you ever used ESI's /markets/{region_id}/orders/
endpoint? Did you know that this endpoint is paginated and allows you to retrieve more than just the first 10,000 orders? Did you know that there are more ESI endpoints than this that support pagination? If the answer to one or more of these questions was no, then this blog is for you! If you answered yes but you have never been quite sure how you are supposed to utilize pagination in ESI, then this blog is also for you! Using the /markets/{region_id}/orders/
endpoint this blog will walk you through getting all orders from a specific region in a concurrent manner. This method can then be used for other paginated ESI endpoints or applied generally to concurrently fetch information from multiple endpoints.
How ESI Handles Pagination
ESI handles pagination the same for most[1] endpoints, here's a high-level example of getting the paginated data in the market orders endpoint:
Send a request to
https://esi.tech.ccp.is/v1/markets/10000042/orders/
to get the first 10,000 orders in the Metropolis region.ESI returns a response that contains 10,000 orders along with the header
X-Pages
which has a value of 10. This is telling you that all active orders in this region are split into 10 pages.Knowing that there are ten pages, you can now send nine more requests (because you already have page 1) starting at page 2 and incrementing up to 10. The way you do this is by adding a
page
query parameter to the URL.After making these ten requests you should have all orders in Metropolis.
Remember that the number returned by X-Pages
will vary depending on the time of day and region, and is simply a way of letting you know how much more data there is. Each paginated endpoint in ESI will possibly have a different max number of items per page which is indicated by the maxItems
property in a given endpoint's swagger spec.
How to tell if an ESI endpoint is paginated
If you want to know if a particular ESI endpoint is paginated you will see "page" listed as a parameter for a given endpoint in its swagger spec.
Setting up your Environment
Before moving on you must set up an appropriate environment. In a directory of your choice, make a new virtual environment:
$ virtualenv .venv --python=python3
Activate this virtual environment:
$ source .venv/bin/activate
Next, run the following pip commands to install the libraries necessary to run the examples in this blog:
(.venv)$ pip install grequests
Getting Market Orders Synchronously using Python
We're going to first get all market orders in Metropolis (region ID 10000042) in a sequential manner. The following is pseudocode to show what we will be doing:
call https://esi.tech.ccp.is/v1/markets/10000042/orders/ extract the X-Pages header from the return for page_number in pages - 1 call https://esi.tech.ccp.is/v1/markets/10000042/orders/?page=page_number+1 combine the data from all calls
If the value returned by X-Pages was 3, then we could visualize this program like so:
where each colored block represents a task over time.
Make a new file called get_sequential.py
and paste this code inside:
import
requests | |
import
time | |
from
requests . exceptions
import
HTTPError | |
MARKET_URL
=
"https://esi.tech.ccp.is/v1/markets/10000042/orders/" | |
def
sync_requests ( pages ): | |
responses
= [] | |
start_time
=
time . time () | |
for
page
in
range ( 2 , pages
+
1 ): | |
req
=
requests . get ( MARKET_URL , params = { 'page' : page }) | |
responses . append ( req ) | |
end_time
=
time . time () | |
elapsed
=
end_time
-
start_time | |
( 'Elapsed time for {} requests was: {}' . format ( len ( responses ), elapsed )) | |
return
responses | |
if
__name__
==
'__main__' : | |
all_orders
= [] | |
res
=
requests . get ( MARKET_URL ) | |
res . raise_for_status () | |
all_orders . extend ( res . json ()) | |
pages
=
int ( res . headers [ 'X-Pages' ]) | |
responses
=
sync_requests ( pages ) | |
for
response
in
responses : | |
try : | |
response . raise_for_status () | |
except
HTTPError : | |
( 'Received status code {} from {}' . format ( response . status_code , response . url )) | |
continue | |
data
=
response . json () | |
all_orders . extend ( data ) | |
( 'Got {:,d} orders.' . format ( len ( all_orders ))) |
view rawget_sequential.py hosted with ❤ by GitHub
Make sure you're still inside the virtual environment you created earlier and run this by doing
(.venv)$ python get_sequential.py
This code will display how long it took to complete all the requests needed to get all of the orders and how many orders it retrieved. Here is the output from running it on my machine:
Elapsed time for 9 requests was: 4.024033069610596 Got 91,064 orders.
As we can see it took about 4 seconds to get 9 pages sequentially (remember that it skips calling the first page again). Why is that? It's because, for every request sent, the requests
library waits for the response from the endpoint. Basically, a lot of the time spent is because of network latency. This is easier to visualize by adding wait time to the previous visualization (the gray areas signify wait time):
In this case, "wait time" is considered to be the time that our code is just waiting for a response from ESI. We could instead take advantage of this wait time and let our code perform more tasks that do not depend on the data returned from ESI. If we were to do this, the code would be considered concurrent.
Getting Market Orders Concurrently using Python
How would retrieving paginated market orders work in a concurrent manner? Which tasks could be done without needing to know about the other? Because the wait time is mostly network related, we could instead start each ESI call one after the other and consume their returns at a later time. Here's pseudocode to show how we will approach this concurrently (comments are preceded by the # symbol):
call https://esi.tech.ccp.is/v1/markets/10000042/orders/ wait for the response from ESI # because we need the value of X-Pages extract the X-Pages header from the return for page_number in pages - 1 call https://esi.tech.ccp.is/v1/markets/10000042/orders/?page=page_number+1 defer waiting for the response consume all responses returned from ESI combine all the orders
Here is a modified version of the previous visualization to represent this:
Remember that this is a simplified visualization and may not represent exactly how these given tasks are scheduled by gevent. Read more about gevent and greenlets here.
How is this done using Python? For this blog we will be using a library called grequests that in turn depends on the libraries gevent and requests. gevent
is a networking library for Python that allows network calls to be started, suspended, and resumed independently in an event loop. grequests
combines the requests
library with gevent
, essentially allowing HTTP requests to be started and consumed at different times. Put the following in a file called get_concurrent.py
:
import
grequests | |
import
time | |
from
requests . exceptions
import
HTTPError | |
MARKET_URL
=
"https://esi.tech.ccp.is/v1/markets/10000042/orders/" | |
def
concurrent_requests ( pages ): | |
reqs
= [] | |
start_time
=
time . time () | |
for
page
in
range ( 2 , pages
+
1 ): | |
req
=
grequests . get ( MARKET_URL , params = { 'page' : page }) | |
reqs . append ( req ) | |
responses
=
grequests . map ( reqs ) | |
end_time
=
time . time () | |
elapsed
=
end_time
-
start_time | |
( 'Elapsed time for {} requests was: {}' . format ( len ( responses ), elapsed )) | |
return
responses | |
if
__name__
==
'__main__' : | |
all_orders
= [] | |
req
=
grequests . get ( MARKET_URL ). send () | |
res
=
req . response | |
res . raise_for_status () | |
all_orders . extend ( res . json ()) | |
pages
=
int ( res . headers [ 'X-Pages' ]) | |
responses
=
concurrent_requests ( pages ) | |
for
response
in
responses : | |
try : | |
response . raise_for_status () | |
except
HTTPError : | |
( 'Received status code {} from {}' . format ( response . status_code , response . url )) | |
continue | |
data
=
response . json () | |
all_orders . extend ( data ) | |
( 'Got {:,d} orders.' . format ( len ( all_orders ))) |
view rawget_concurrent.py hosted with ❤ by GitHub
As can be seen, grequests
does all the heavy lifting as far as the concurrent logic is concerned, and the magic of the concurrent calls is done particularly in the grequests.map()
method that is called inside the concurrent_requests
method. grequests.map()
simply needs Request objects so that it can then call all endpoints and consume the returns from ESI at a later time. If you're interested in understanding further how grequests.map
works you can read the code.
Run this by doing the following command:
(.venv)$ python get_concurrent.py
and you should get output similar to:
Elapsed time for 9 requests was: 0.554999828338623 Got 91,064 orders.
As we can see, the time it took to make the requests dramatically dropped. When this was done sequentially it took about 4 seconds and when done concurrently it took about 0.5 seconds.
Conclusion
Making calls to ESI in a concurrent way will dramatically speed up your software. Hopefully, the demonstration in this blog can serve as a jumping off point for you to continue on a concurrent path. There are currently only two other endpoints in ESI that handle pagination with the X-Pages
header. The first is the /character/{character_id}/blueprints/ endpoint which recently got the addition of pagination and is currently only in the /v2
and /dev
namespace. The second endpoint is /markets/{region_id}/types.
There are other paginated endpoints in ESI but they do not support the X-Pages
header as of yet. However, Team Tech Co. plans to expand these endpoint's functionality to use the X-Pages
header. These endpoints are:
https://esi.tech.ccp.is/latest/#!/Contacts/get_characters_character_id_contacts
https://esi.tech.ccp.is/latest/#!/Corporation/get_corporations_corporation_id_structures
https://esi.tech.ccp.is/latest/#!/Market/get_markets_structures_structure_id
https://esi.tech.ccp.is/latest/#!/Universe/get_universe_groups
https://esi.tech.ccp.is/latest/#!/Universe/get_universe_types
https://esi.tech.ccp.is/latest/#!/Wars/get_wars_war_id_killmails
o7
CCP Zoetrope
[1] The following endpoints could be considered paginated but operate differently:
https://esi-test.tech.ccp.is/latest/#!/Killmails/get_characters_character_id_killmails_recent uses
max_kill_id
andmax_count
to allow you to limit data returned.https://esi-test.tech.ccp.is/latest/#!/Wars/get_wars uses
max_war_id
to allow you to limit data returned.https://esi.tech.ccp.is/latest/#!/Killmails/get_corporations_corporation_id_killmails_recent uses 'max_kill_id' to allow you to limit data returned
Appendix
The following paths have been added or updated in ESI since the previous blog:
GET /latest/corporations/{corporation_id}/wallets/ (v1)
GET /latest/corporations/{corporation_id}/membertracking/ (v1)
GET /dev/alliances/{alliance_id}/ (v3)
GET /latest/fw/systems/ (v1)
GET /latest/fw/leaderboards/corporations/ (v1)
GET /latest/fw/leaderboards/characters/ (v1)
GET /latest/fw/leaderboards/ (v1)
GET /latest/characters/{character_id}/planets/{planet_id}/ (v3)
GET /latest/characters/{character_id}/notifications/contacts/ (v1)
GET /latest/universe/systems/{system_id}/ (v3)
GET /dev/characters/{character_id}/blueprints/ (v2)
GET /dev/characters/{character_id}/roles/ (v2)
GET /latest/corporations/{corporation_id}/killmails/recent (v1)
GET /latest/corporations/{corporation_id}/wallets/{division}/journal (v1)
GET /latest/corporations/{corporation_id}/divisions/ (v1)
GET /latest/corporations/{corporation_id}/members/limit (v1)
GET /latest/corporations/{corporation_id}/contacts (v1)
Get a sneak peek at what's coming to ESI by watching this board.