getlabeltext.com

Toolkits not frameworks

Choosing a technology stack to build microservices is not an easy task. Large, batteries included frameworks like Spring, Rails or Django don’t feel like the fit the bill anymore. The Spring framework for example feels like an antiquated and excessive choice for building microservices. It can easily get sluggish startup times the more complex your dependency graph gets and just using Hibernate/JPA makes it unbearably slow to start. Using Spring also comes with a mental overhead, not just a startup one, since it aims to facilitate more enterprise-y and monolith-y applications, meaning even simple stuff require either some expertise or some waddling through documentation a bit too often....

Choosing a stack

With the system architecture more or less finalized, at least the first version of it, I now need to make a decision about the stack of the system. getlabeltext.com microservice architecture diagram v0.1 Obviously some parts will be pure python: the classifiers are built on Pandas and Scikit but the rest of the system will probably need to be built on something more manageable: flask is not a good web framework and Django is probably overkill....

Microserving large datasets

Getlabeltext.com is a cloud text classifier. Text classifiers are nothing more that statistical functions that accept text as input and product a label as output. To do that text classifiers rely on an internal “database” of text tokens and their frequencies. This “database” is built during the training phase and can be re-used for every run there after. Training classifiers is time consuming and usually requires large amounts of data which means getlabeltext....