What an IaaS service is. And what it is not.

What an IaaS service is. And what it is not.

The term “Cloud Computing” has been openly used for almost ten years now, but there are still some misconceptions around the concept itself and around some more specific words like “IaaS” (Infrastructure as a Service).

Sometimes I have to face pointless discussions with people that have completely wrong ideas and expectations: this can be annoying from my point of view, but can be catastrophic for realities deciding to make “the big move” without having completely understood what the cloud is all about.

iaas

If you have come across this post as you’re still trying to figure out what “Cloud Computing” and “IaaS” mean, then let me save your life and probably your job with some clarifications.

The market offering isn’t helping us, as service providers are confused as well and they use to define “IaaS” completely unrelated products. The US NIST has released a document containing a list of 5 “Essential Characteristics” of cloud services, but they are not so specific and won’t help you make any choice.

When words are being used in such a confused way, you have to decide which of the many interpretations is the “authoritative” one: my authorities for this article are Amazon Web Services (and not because I work for them, but because ten years ago they have been the first at offering an IaaS platform) and OpenStack (that is, AWS concepts and terms reviewed by the biggest open source community in the cloud computing world).

So, what you should expect or not expect from an IaaS offering?

  • You should expect to be billed based on a Pay as you go model. Let’s be serious, if you have to pay an one time or monthly fee for your account and/or services you are using then this is not really cloud. Offering pay as you go services is a real technical challenge for the service provider, and if they aren’t giving you this option then you should have some doubts about them being up to date with the technology. Some providers will offer you discounts on long term commitments and this is fine, but always look for the PayG option, please.
  • You should expect to have full access to API and CLI tools and not just to a GUI. This is critical also if you are not planning to use them from the beginning. Cloud is all about automation, and if you stick with a service that only offers a GUI, then you will be forever bound to your mouse (and hands): if you come from an on premise physical server environment you could not see my point right now, but in the cloud you will start using automation soon, at least in its basic form. Because it’s easy and useful.
  • You should not expect your instances (virtual machines) to be always available. This is something I’ve already blogged about a few years ago (in italian, I’m sorry) but it’s still one of the biggest, most spread and more dangerous misconceptions. Cloud services are based on commodity hardware, and thus the instances on top of that should be considered in the same way, as a commodity. The single instance could be there or couldn’t be, and your customers don’t have to notice: you have to plan for high availability at application level, taking into account the various kinds of failure. Some additional services like Block Storage, Object Storage and Load Balancing as a Service will help you achieving the high levels of availability you need. If your service provider is offering you an extreme level of HA, then you’re probably paying for something you don’t need (if you’re using 5 web nodes, then what’s the matter if one of them goes down for a while?).
  • You should expect instant provisioning: seriously, provisioning has to happen in seconds. Be careful not to underestimate this: you could be happy with a 24 hours delivery time for your first bunch of servers, but believe me you won’t be when you will need to rapidly scale because of a traffic peak. Maybe I’m being too picky here but I expect the provisioning of my account to happen in real time as well: I’m not happy with providers asking me to send a physical signed contract or my IDs before using their service.
  • You should expect the service you choose not to have limits that could (and will) impact you. Okay, not all of us need the scale of AWS, but make sure your provider won’t go out of capacity when you will need it: planning for infrastructure is their job, and from your point of view you must always be able to use the resources you need, when you need them, with no previous commitment.
  • You should (probably) expect to have access to multiple autonomous regions: being it for active-active HA or just for backup and disaster recovery purposes, doesn’t make so much sense to choose a provider that is hosting its entire platform in a single datacenter. Yes, you could choose to use 2 different services providers hosting services in different locations, but this is not going to be easy to deal with.
  • You should (probably) expect not to be locked in by small-scale service providers: always look for open standards, expecially if the company you’re buying resources from is still at a scale where going out of business from one day to another is a (remote) possibility.
  • You should not expect to be able to easily scale vertically (increase instance size, or a single resource inside the instance): cloud computing is based on horizontal scalability (that means adding building blocks, not making the existing ones bigger), and this is why service provider don’t focus so much on hot resize of instances or on the ability to add RAM if you need RAM without modifying anything else. This is related to availability as well: if you can’t afford a planned downtime on a single instance in your infrastructure, then you’re doing something wrong.

That’s it, at least for now. I’m sure moving to the cloud is the right choice almost for every company in the world, but please make sure you fully understand it before making any choice. Really.

Giorgio

Time to clean things up.

Time to clean things up.

It was largely unexpected, but yesterday’s post had an enormous success. Okay, nothing compared to The Blonde Salad‘s posts, but I wasn’t expecting at all to get 500 visits in a couple of hours on a blog that I was considering as dead & forgotten.

This means it’s time to focus on improving your experience on this website. The weather in Ireland, where I currently am, is really helping me focus on my blog:

img_2570

What I’ve done so far, in detail:

  • HTTPS: I finally completed the SSL integration. All static links have been modified to use HTTPS, and any HTTP URL is now redirecting to its SSL version.
  • Categories & Tags: I wasn’t using categories and relied on tags to categorize my posts. After a few years the tag cloud had become a real mess, so I spent a few hours in cleaning it up and reducing the number of tags per article. From now on, every post won’t have more than 5 tags and will belong to 2 categories: the real category, and a second one (English, Italian) based on its language.
  • Caching: WordPress wasn’t performing at its best, so I tuned W3 Total Cache and switched to Memcache as its backend. It’s much better now.
  • Permalinks: Sounds like my permalinks are not so permanent. I’ve modified some titles and URLs, so you should expect to incur in 404 errors for the next few days if you’re getting here via Google or old links.
  • MySQL: Yes, believe it or not, I was still using MySQL 5.5. Switched to MariaDB 10.1, and I’m in the process of tuning it: you should expect some brief downtimes in the next few hours, while I restart services.

That’s it, for now at least.

Stay tuned!

Giorgio is back!

Giorgio is back!

Yes, I’m back: this blog has been abandoned for like three years now, and I feel it’s time to bring it back to life. There is no particular reason behind this choice: I just need a virtual place where I can express my ideas and aggregate content I’ve always been disseminating over the internet for free (comments, forum posts, and so on).

New life means new theme (still in its basic version) and new language: some of my old posts are being read trough Google Translator, and as in the last few years my main language (mainly due to my job(s) and relationships) has been english, I have no reason at all to keep posting in italian. It’s just going to restrict my audience.

This time I won’t do what I did in all the previous “renovations”: I won’t destroy the old posts. The first one dates back to 2009 and I think they are a pretty important piece of history for them to disappear from the internet. I’m recovering the backups of the old versions of this blog in order to merge them with this one.

So, what has changed in those three years?

First, and maybe most important choice to date, I decided to put on hold (and then completely abandon) my studies at the University (Politecnico di Milano). This choice has been strongly dictated by the context (I was attending in Italy): although I perfectly understand the importance of learning the basis and developing a method for “doing things”, I felt what I was studying was too far behind reality. Spending years and thousands of euros to end up working as an underpaid intern in some big company was definitely not what I was expecting from my life.

The networking manual we were using (please mind it was 2013 and it was still being printed and was largely adopted) at a certain point stated that Ethernet was being superseded by FastEthernet, and that some big ISPs were deploying experimental long haul GigabitEthernet links. This was way too much (for non technical people reading this post: in 2013 we were already in the Terabit/s era, with multiple 100GigabitEthernet -100 times GbE- being used in long haul transits): reading this sentence, and then seeing that people that was able to get the best marks at the exam while thinking that GbE was the future (and not the past), helped me realize how detached from reality we were.

I decided to stop wasting time and joined CloudAcademy, a company that is trying to explain and show people how to take advantage of cloud services, as the Training Paths Supervisor. Feeling I had to head back to the battlefield, I decided in a few months to move to Enter, an italian ISP/CSP which at the time (late 2013) was working on the launch of a new multi-region OpenStack-based IaaS service, Enter Cloud Suite.

In Enter I have been employed first as a Cloud Architect and then as the Head of Cloud Architecture, with ECS as the main focus: I spent 2 years and a half designing and implementing hosting infrastructures for large scale news and e-commerce websites and designing, implementing and sometimes managing the OpenStack infrastructure behind Enter Cloud Suite. I was focused on the networking stack (both physical and overlay), and this gave me the opportunity to meet some very interesting realities like Cumulus Networks and Mellanox.

Then, in the first months of 2016, Amazon Web Services called: they offered me a position as a Technical Account Manager in London and I decided to accept it and move from Milan: everything happened so quickly I still have to realize what this means.

It’s very hard to explain what does it feel like being part of such a fast growing company, the one that has been the reference for your entire working life. “Work Hard. Have Fun. Make History.” is our slogan, and what it is all about: I’m sitting in the buildings where history is being written, day by day.

That’s it. This is the story of how I ended up writing this post, while laying on the bed in my apartment in Canary Wharf.

This is definitely a new beginning, and not just for this blog.

img_2230

As you wait for the next post, please enjoy the view from my bedroom.

Giorgio

No, non mi occupo di cosmetici.

No, non mi occupo di cosmetici.

Sembra ci sia un tale Giorgio Bonfiglio che sta contattando persone online offrendo un lavoro a domicilio consistente nel confezionare cosmetici. Chiede una cauzione di 35€ da pagare tramite ricarica Postepay per l’invio del primo pacco di materiale, e ovviamente ricevuta questa cauzione non spedisce il pacco.

Ho fatto varie ricerche, e sembra se ne parli solo su questo sito, nei commenti. Il mio simpatico omonimo non sembra nuovo a queste attività: leggo che è già stato arrestato una decina di anni fa per aver sparso in giro una certa quantità di assegni falsi.

Il problema non è tanto il caso di ominimia, quanto il fatto che io, per chi cerca su Google quello che è anche IL MIO nome, occupo quasi tutta la prima pagina di risultati. Trovare i miei recapiti poi è facilissimo. Ho ricevuto una telefonata, e sono centinaia le persone che sono entrate su questo blog con keyword legate a questi avvenimenti: non me ne ero reso conto, ci sono arrivato solo a posteriori.

Sia chiaro a tutti: quello non sono io. Mi occupo di engineering di infrastrutture, non sono un odontotecnico. Nel 2003 non avevo 48 anni, e soprattutto non ho la passione dei cosmetici.

Mettetevelo in testa.

Design for failure

Design for failure

“Scusa ma non siamo capaci di offrirti la stabilità di cui hai bisogno, potresti pensarci tu?”

(Anonimo Cloud-Eretico sul Design for Failure)

C’erano una volta… gli sviluppatori, e le applicazioni. Gli sviluppatori si concentravano sul codice, dando per scontata la stabilità e la scalabilità dell’infrastruttura sottostante: usavano query SQL indescrivibili, ed era compito del sistemista farle girare velocemente, scrivevano codice senza gestione delle eccezioni perchè era compito del sistemista far si che quel determinato database server fosse sempre disponibile e non restituisse mai errori. Scrivevano software impossibile da distribuire su più macchine perchè tanto il sysadmin, in qualche modo, avrebbe fatto.

La colpa di ogni rallentamento o malfunzionamento di chi era? Del sistemista. Questo ha portato chi si occupa di infrastrutture a progettare soluzioni sempre più avanzate per far sopravvivere l’applicativo alle più inimmaginabili catastrofi, senza che questo subisse mai malfunzionamenti. Qualunque disgrazia fosse accaduta alle macchine che la servivano, l’applicazione sarebbe dovuta rimanere in piedi e funzionante.

Va detto che ci siamo (quasi) riusciti. Grazie alla virtualizzazione siamo arrivati a creare quello che è a tutti gli effetti hardware indistruttubile: la macchina fisica è diventata virtuale e quella virtuale sappiamo muoverla tra diversi nodi senza spegnerla, al solo costo di qualche millisecondo di freeze.

Abbiamo così creato piattaforme che astraevano quasi completamente la complessità sottostante, usando processori virtuali che restavano disponibili anche se quelli fisici prendevano fuoco e dischi virtuali che continuavano a servire dati anche se l’intero rack di storage veniva rubato dagli alieni.

Questa soluzione non era però ottimale: la replica sincrona, per esempio, era possibile solo in ristretti contesti geografici. Il costo di queste soluzioni era spesso proibitivo, e la loro complessità alta e non necessaria. Queste strutture, per quanto immortali potessero essere, erano sempre sotto la stessa autorità amministrativa. Tutto per non dare agli sviluppatori un compito in più: gestire la disponibilità dell’applicazione.

Screen Shot 2014-01-28 at 20.58.15

(Anonimo Cloud-Eretico che non ha compreso il ‘Design for Failure’)

Poi è arrivata una nuova generazione di developers: sviluppatori che volevano più controllo, volevano poter decidere come l’applicazione avrebbe reagito a malfunzionamenti dell’infrastruttura, e soprattutto si rifiutavano di pagare al fornitore complessi meccanismi di failover perchè… non ne avevano bisogno. Sapevano fare di meglio e sapevano farlo in modo più economico ma soprattutto più effettivo, più semplice.

Questi sviluppatori non chiedevano più a chi vendeva infrastrutture hardware immortale, chiedevano semplicemente del ferro: di qualunque tipo, prestazioni, forma, colore e dimensione, ed in ogni luogo. Si sarebbero occupati loro di inoltrare meno richieste ai processori meno potenti, di tenere in RAM i dati se i dischi della macchina erano troppo lenti. Si sarebbero curati loro di evitare di interrogare un database server che non rispondeva più ai comandi.

Volevano occuparsi, soprattutto, delle azioni di disaster recovery nel caso in cui un intero datacenter fosse andato a fuoco. Perchè nessuno meglio dello sviluppatore può sapere come deve reagire una applicazione a determinati eventi e di cosa questa ha bisogno.

Hanno poi iniziato a chiamarlo ‘Design for Failure’. La disponibilità non è più compito di chi gestisce l’infrastruttura: è l’applicazione ad esser progettata per far fronte a ogni evento o disgrazia, e la struttura sottostante fa solo il sollevamento pesi.

Nel modello ‘Design for Failure’ ognuno fa il suo lavoro: lo sviluppatore conosce l’applicazione e si occupa di farla funzionare, il gestore dell’infrastruttura si occupa delle prestazioni ma non si infila più in infiniti tunnel senza uscita per garantirne la disponibilità. Tutti risparmiano, perchè è tutto più semplice, con meno sovrapposizioni. Tutti vincono: perde solo chi non ha voglia di innovare.

Ecco perchè questo modello non è un fallimento, come tanti lo descrivono: è il futuro.

%d bloggers like this: