RPC Frameworks: gRPC vs Thrift vs RPyC for python
28 Jul 2018Introduction
The year was 2015. I was writing a bunch of ML training scripts as well as several production scripts. They all needed financial data. Data was spread across multiple tables and multiple datastores. Intraday market data was stored differently in a cassandra cluster, while the daily/monthly data was in a MySQL database. Similarly, different types of securities (Future, Option, Stock etc.) were stored in different locations.
So, I decided to make a data library that I can use in my scripts. This data library turned out to be quite popular with my team. It had all the things we needed at that point:
- A single interface for all data types - Futures, Stocks, ETFs, Currencies, Indexes, and Funds from different exchanges.
- Easy to use interface.
- Flexible in terms of the data intervals supported. It worked flawlessly for interday, daily and monthly time periods.
- It could be used for both live ingestion/consumption as well as historical data requirements.
- It was easy to support a new type of data - for example, macroeconomic indicators.
However, it had some fatal flaws that I could not foresee at that point. Over time, the number of production scripts relying on this library grew exponentially. Our data library directly called database queries.
- Changing anything in the database would break existing production processes. So, there was no way of changing the database without incurring downtime.
- Additionally, rapidly increasing production processes caused a strain on the database. Because the database access was finely ingrained into the rest of the codebase, it was not possible to optimize or load balance properly.
About a year ago, I was asked if we should convert that library to a service. I brushed it away - not realizing the problems I was going to face in the next year. To be fair, I didn’t fully understand the services or microservices at that point - that made me skeptical of its use for something like data fetching. I was still convinced the flexibility and rapid changes would only come from having that code as a library.
But, I finally started taking another look at services a few days ago.
I looked at gRPC
, Thrift
and RPyC
over the past few days. I am summarizing my initial findings in this post. Because I mostly use python for everything, I am approaching these frameworks from that point of view.
You can find the code for the subsequent examples in this repo.
gRPC
gGPC uses Protocol Buffers for serialization and deserialization. It was developed by Google - they released this as an open source software when they were rewriting their internal framework called stubby. At the moment, several companies including Netflix and Square are using this framework to implement their services.
Let’s jump directly into the simplest example.
We will use the same toy example for all 3 frameworks:
- We will define a service called
Time
. - It implements a single RPC call called
GetTime
. -
GetTime
doesn’t take any argument and returns the current server time instring
format.
Simple gRPC Example
Create a time.proto
Protocol Buffers file describing our service.
syntax = "proto3";
package time;
service Time {
rpc GetTime (TimeRequest) returns (TimeReply) {}
}
// Empty Request Message
message TimeRequest {
}
// The response message containing the time
message TimeReply {
string message = 1;
}
And here’s a bit of explanation of the above code.
Now, use the above protobuf file to generate python files time_pb2.py
and time_pb2_grpc.py
.
We will use them for both our server and client code. Here’s the command line code to do so (you will need the grpcio-tools
python package):
python -m grpc_tools.protoc --python_out=. --grpc_python_out=. time.proto
Create the server script server.py
.
import time
from concurrent import futures
import grpc
import time_pb2
import time_pb2_grpc
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
class Timer(time_pb2_grpc.TimeServicer):
def GetTime(self, request, context):
return time_pb2.TimeReply(message=time.ctime())
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
time_pb2_grpc.add_TimeServicer_to_server(Timer(), server)
server.add_insecure_port('[::]:50051')
server.start()
try:
while True:
time.sleep(_ONE_DAY_IN_SECONDS)
except KeyboardInterrupt:
server.stop(0)
if __name__ == '__main__':
serve()
And here’s the annotated server code:
Add client code to the client.py
file.
import grpc
import time_pb2
import time_pb2_grpc
def run():
channel = grpc.insecure_channel('localhost:50051')
stub = time_pb2_grpc.TimeStub(channel)
response = stub.GetTime(time_pb2.TimeRequest())
print('Client received: {}'.format(response.message))
if __name__ == '__main__':
run()
I have added the annotated client code below.
More Details
gRPC uses HTTP/2 for client-server communication. Every RPC call is a separate stream in the same TCP/IP connection.
4 different types of RPCs supported:
-
Unary RPC - a single request followed by a single response from the server. Our TimeService example uses Unary RPC.
rpc GetTime (TimeRequest) returns (TimeReply) {}
-
Server Streaming RPC - client sends a request and gets a stream to read from.
rpc GetTime (TimeRequest) returns (stream TimeReply) {}
-
Client Streaming RPC - Client writes a sequence of messages.
rpc GetTime (stream TimeRequest) returns (TimeReply) {}
-
Bidirectional Streaming RPC - Both sides send a sequence of messages using a read-write stream.
rpc GetTime (stream TimeRequest) returns (stream TimeReply) {}
gRPC comes with an inbuilt timeout functionality. This is quite handy in practice. Many applications require a response within a certain time interval.
Pros and Cons
Pros:
- Multiple Language Support for both servers and clients.
- It uses HTTP/2 by default for connections.
- Abundant documentation.
- This project is actively supported by Google and others.
Cons:
- Less flexibility (especially compared to
rpyc
).
Links:
- Official Website and Tutorial - https://grpc.io/docs/guides/.
- gRPC Concepts.
Thrift
Thrift
is quite popular at Facebook and in the Hadoop/Java services world. It was created at Facebook and they open sourced it as an Apache project at some point.
Simple thrift Example
Create time_service.thrift
file describing the Interface using Thrift Interface Description Language (IDL).
service TimeService {
string get_time()
}
Run the following command to generate python code. It will create a gen-py
directory. We will use it to build Server and Client scripts.
thrift -r --gen py time_service.thrift
Write the following server code in server.py
.
import sys
import time
from thrift.protocol import TBinaryProtocol
from thrift.server import TServer
from thrift.transport import TSocket, TTransport
sys.path.append('gen-py')
from time_service import TimeService
class TimeHandler:
def __init__(self):
self.log = {}
def get_time(self):
return time.ctime()
if __name__ == '__main__':
handler = TimeHandler()
processor = TimeService.Processor(handler)
transport = TSocket.TServerSocket(host='127.0.0.1', port=9090)
tfactory = TTransport.TBufferedTransportFactory()
pfactory = TBinaryProtocol.TBinaryProtocolFactory()
server = TServer.TSimpleServer(processor, transport, tfactory, pfactory)
print('Starting the server...')
server.serve()
print('done.')
Write the following code in client.py
.
import sys
from thrift import Thrift
from thrift.protocol import TBinaryProtocol
from thrift.transport import TSocket, TTransport
sys.path.append('gen-py')
from time_service import TimeService
def main():
# Make socket
transport = TSocket.TSocket('localhost', 9090)
# Buffering is critical. Raw sockets are very slow
transport = TTransport.TBufferedTransport(transport)
# Wrap in a protocol
protocol = TBinaryProtocol.TBinaryProtocol(transport)
# Create a client to use the protocol encoder
client = TimeService.Client(protocol)
# Connect!
transport.open()
ts = client.get_time()
print('Client Received {}'.format(ts))
# Close!
transport.close()
if __name__ == '__main__':
try:
main()
except Thrift.TException as tx:
print('%s' % tx.message)
Simple thriftPy Example
thriftPy seems to be more popular than the default python support. It also solves some common issues with the default python support - this includes a more pythonic approach to creating server and client code. For example, check out the following server and client code:
Server code
import time
import thriftpy
from thriftpy.rpc import make_server
class Dispatcher(object):
def get_time(self):
return time.ctime()
time_thrift = thriftpy.load('time_service.thrift', module_name='time_thrift')
server = make_server(time_thrift.TimeService, Dispatcher(), '127.0.0.1', 6000)
server.serve()
Client code
import thriftpy
from thriftpy.rpc import make_client
time_thrift = thriftpy.load('time_service.thrift', module_name='time_thrift')
client = make_client(time_thrift.TimeService, '127.0.0.1', 6000)
print(client.get_time())
Pros and Cons
Pros:
- Thrift supports container types
list
,set
andmap
. They also support constants. This is not supported by Protocol Buffers. However,rpyc
supports all python and python library types - you can even send a numpy array in an RPC call. (Edit: proto3 supports those types too. Thanks Barak Michener for pointing this out.)
Cons:
- Python doesn’t feel like a primary language for Thrift. Having to add
sys.path.append('gen-py')
doesn’t make for a smooth python experience. - Documentation and online discussions seem relatively scarce compared to
gRPC
.
RPyC
RPyC
is a pure python RPC framework. It does not support multiple languages.
If your entire codebase is in python, this could be an easy and flexible framework for you.
Simple rpyc Example
server.py
import time
from rpyc import Service
from rpyc.utils.server import ThreadedServer
class TimeService(Service):
def exposed_get_time(self):
return time.ctime()
if __name__ == '__main__':
s = ThreadedServer(TimeService, port=18871)
s.start()
Here’s the annotated server code:
client.py
import rpyc
conn = rpyc.connect('localhost', 18871)
print('Time is {}'.format(conn.root.get_time()))
Annotated client code:
Pros and Cons
Pros:
- Probably the easiest to get started. No need to understand Protocol Buffers or Thrift syntax.
- Extremely flexible. No need to formally use IDL (Interface Definition Language) to define the client-server interfaces. Simply start implementing your code - it embraces python’s Duck Typing.
Cons:
- Lack of multiple client languages.
- Lack of formally defined service interface can potentially cause maintenance issues if the codebase becomes large enough.
gRPC vs Thrift vs RPyC comparison matrix
Let me summarize my experiences here before jumping into the details of each framework.
gRPC | Thrift | RPyC | |
---|---|---|---|
Getting Started | |||
Documentation | |||
Language Support | C++, Python,.. | C++, Python,.. | Python Only |
Maintenance | |||
Streaming | |||
Can work without IDL |
Notes on the above table:
- I found it relatively hard to get the basic
Thrift
example working. The few python examples I found were targetted for older thrift version (and python2). - My opinion on “Maintenance” is based on the fact that
RPyC
doesn’t have an IDL (gRPC uses protobuf, Thrift uses Thrift IDL) - it embraces duck typing. While this makes it really easy to get started, it can be a bad thing when it comes to maintenance.
My preferences are:
- I would personally perfer to use
RPyC
if python is the only language I am going to use. - I would prefer to use
gPRC
if I needed robustness, reliability, and scalability from my services. - The best thing about
Thrift
is that it supports so many languages. If that’s what you’re targetting, go forThirft
.
Other important things to note:
- I did not compare speed. This might be the most relevant criterion for some.
- I do not have experience with very large services. I am not the right person to comment on maintainability of each framework. However, this is an important criterion to decide which RPC framework to choose.
You can find the code for the above examples in this repo.