- CNC Step
Every time I have to deal with the Thumbor storage configuration options, I get really confused. For my own reference, here is what I need to know:
The key is in the terminology. The thing that thumbor calls “storage” is a different thing entirely than what they call “result storage”. “upload storage” is yet another thing.
Storage caches a) source files b) “crypto data” and c) the result of the “feature detectors” (see the very clean base class). You can then set the storage class to the Mixed Storage implementation, and can then use a separate storage class for each of the three types of data.
Result storage saves the processed results.
Upload storage is what backs Thumbor’s upload feature.
I’ve been experimenting with Docker and it’s ecosystem for a while, and my setup has become a bit of a mess; different machines using various old versions of Docker and various generations of custom scripts to manage them. It was time for an overhaul, and I set out to have a closer look at the tools our there.
It’s kind of a mess. Everyone wants to release an orchestration tool, and often their places in the stack are all over each other.
So let’s consider different parts an orchestration system might cover:
A scheduler that will run the containers. The most bare-bones version is a CLI script that runs imperatively; It might be upstart or systemd on single-host systems. Or it’s a networked-cluster scheduler like Swarm. A cluster scheduler essentially needs a process that runs on every host. A lot of full-stack orchestrators have their own scheduler, say Tutum (now Docker Cloud). There are standalone schedulers like fleet though that you can harness.
A networking solution to let containers talk to each other. On multi-node clusters this will be some sort of overlay network; your cloud provider might provide it. Even on a single host, you want to let services talk to each other, but only expose some particular services to the public (say the HTTP router). In the early days of Docker, the answer was host mapping.
A Service discovery solution. You want your web app container to talk to the MySQL container, so you have to know it’s address. These days, pretty much everyone seems to use a) a cluster-internal network that every service has an address on b) DNS-based resolving, usually build right into the networking layer and c) ‘links’ in the sense that the DNS name to connect to is injected via environment variables. In the early days of Docker, this was messy. We used various tools to register the services when they start (sdutil, registrator), and to query the service discovery and link on service to another (sdutil, ambassadord); frequently, host mapping to random ports was involved. See also my previous post on this.
A proxy to route to the services you want to expose. In simple cases, you can just publish your webapp on port 80 directly, but if you have two apps, you need to route to them based on the respective domain. Because the router needs to know the address of the backend, this router might integrate with service discovery.
Developer tools, for example deploying an app on every push to the repo.
Cluster-wide persistent storage. Unless you’re not on cloud provider, I consider this to be still unsolved, despite various Docker volume plugins. It’s just very hard to setup.
Before I look at the full-stack tools, here are some of the implementations that focus on particular layers in that stack:
The swarm scheduler that you ran on as a docker container, now being presumably deprecated in 1.12.
Docker Swarm Mode
The new swarm mode built directly into docker. It is incredibly simple to use.
A networked systemd. Core-OS specific.
The network overlay that gives every container their own private ip is clearly the winner here. A lot of orchestrators have their own solution. Generic ones include weave and flannel.
A lot of orchestration tools naturally target ops and don’t deal with this part. There are basically two approaches:
A service handles git pushes, runs the code through slugbuilder, stores the slug as a
tar.gz somewhere. To run it, the blob is given to a
slugrunner image. In other words, your build artifacts use the Heroku
slug format, and there is a custom system to hold the version history.
You build every version of your app into a docker image directly. You use your docker registry for version management. In the simpliest case, you just set up a Github webhook and let Docker Hub build.
A distributed filesystem.
Docker volume pugin; too enterprisy for me.
Written as part of Rancher. Integrates nicely there. Outside of it, has bad instructions. Does support NFS, block dvices.
Supports NFS, CIFS.
Now, let’s look at some full-stach approaches and where they fall in the stack:
Flynn literally implements the whole stack by themselves, and exposes everything with a very limited, thin Heroku-like API.
What do I think?
Tutum was bought by Docker and rebranded.
Essentially, Docker Cloud is:
a) a scheduler.
b) assembles some tech for you (weave).
c) a UI that is thin interface on top of Docker itself (you still interact with containers and their config a lot), including the “stack” abstraction (a collection of multiple services).
Regarding the proxy: The idea of using a battle-tested haproxy is nice, but in practice, I continuously run into issue: Often it required a restart when updating/changing services. It’s also limited in that it requires defining https + http urls, and cannot do redirects. It requires manually linking the proxy container to all services; if a reload fails (say an issue with ssl cert), all of the sites will be down.
Also I wonder what will happen to Docker Cloud now that the Docker daemon itself now implements essentially everything that Docker Cloud offers, but with different tech (stacks and services as an abstraction, a network overlay, a scheduler). It might end up being just a UI on top of the docker daemon.
The idea of different backends is nice, but in practice, Rancher doesn’t paint over the differences. In other words, whatever backend you choose, the frontend you work with will be different, too. The “app catalogs” they offer are separate too. So it’s basically four different products, and not all of them have the same quality. I see a lack of focus here.
What do I think?
Docker used to be just the engine. Then they added Swarm as a separate scheduler. A native network overlay. docker-compose as a dev tool. I already talked about Docker cloud.
Now with 1.12, Docker itself has the swarm scheduler built in, and understands a “service” abstraction. Just everything.
While it uses Docker as a basic container runner, contrary to other tools it doesn’t expose it at all. You are dealing with a custom CLI and custom abstractions, and there are *a lot of them. Ingress resources, secrets, it’s own volume system. For example, a “service” in Kubernetes doesn’t actually need to run on Kubernetes. Other apps can refer to the service without knowing whether it runs inside or outside of the cluster. Or think about the fact that for scaling, you don’t say
replicas=3. It’s abstracted inside a “replication controller”.
What’s my idea?
How does it work inside?
This basically adds a Heroku-like “git push for deploy”-based Workflow on top of Kubernetes.
I like the idea. Build on top of an existing scheduler, provide all the pieces for development.
I didn’t look into this one too closely, because I somehow couldn’t communicate with it. It seems that:
Getting the cert
Basically: Run Let’s Encrypt with ‘–standalone’. For validation, Let’s Encrypt will try to find a file on your domain (/.well-known/acme-challenge). The proxy redirects that to the container you just started (which might need to have a fixed address/ip, or the proxy needs to find it via the regular service discovery mechanism you are using).
Installing a cert
Renew a certificate
I recently needed a performant way to manage a tree, meaning using MPTT, nested sets etc.
I looked at the following libraries, which integrate with SQLAlchemy. These are my notes, some of which seem to be outdated now.
childreninto the model by itself, without allowing customization.
rebuild()is not able to init a tree from scratch.
rebuild()feature at all
tree_recursive_iterator()for querying a tree.
sqlalchemy-orm-tree seem to share a lot of ideas/APIs.
sqlamp seems to be the most most stable / actively developed, although
sqlalchemy-orm-tree makes a good impression, too, and the tree move detection is so helpful that I decided to go with it.
I’ve run into this Thunderbird issue a number of times, always with a Dovecot IMAP server:
There are a lot of people complaining about this, and various workarounds suggested (1, 2), but I’ve never seen a clear explanation of why it doesn’t work and who is really at fault, more importantly, why it isn’t fixed.
The best answer I found is that Thunderbird’s implementation of the CONDSTORE IMAP feature (which allows more optimized syncing) is broken, isn’t getting fixed since Thunderbird development is basically dead, and that disabling CONDSTORE, either on the server (i.g. in Dovecot) or in Thunderbird (disable “use_condstore” setting in the advanced config editor) is the best way to fix it.