Please enable cookies in your browser to experience all the personalized features of this site, including the ability to apply for a job.
Senior Software Development Engineer
6 months ago(18.5.2018 11:55)
Amazon Data Services Ireland Limited
Company/Location (search) : Country (Full Name)
Senior Software Development Engineer - Alerts and Anomaly Detection
The Network Alerts team in Amazon is based in Dublin, Ireland. We are part of the AWS Networking organization. Our mission is to process network telemetry messages and interpret them in a way which monitors the network effectively. Our goal is to detect impact to customer traffic and fix the root cause within seconds. The network is the largest and fastest growing network in the world. The customer traffic we are monitoring is your traffic because thousands of apps and websites that you use are based on AWS.
Our traditional monitoring services are critical to the smooth running of the network and those services are truly large scale - processing over 30 million observations per second. The services are predominantly written in Java on Linux and they are large - even by Amazon standards. They are distributed over thousands of hosts in hundreds of global locations and operate at higher than "five nines" availability. In 2018 we began to incorporate anomaly detection techniques into our suite. We are using Data Science and Machine Learning (ML) approaches such as Exponential Smoothing, Distribution Modelling, Clustering, and Spatial Cosine Similarity. We have put these techniques into production and we can now detect issues which were previously undetectable - for example by dynamically choosing the right threshold for an alarm covering a million ports, or forecasting the traffic level of an internet exchange, or finding a rare natural language log among a corpus of billions. By the way, we do all of this on live time series data.
With the success of anomaly detection in 2018 we are doubling down. In 2019 we finish the implementation of 6 separate anomaly detector services and will plug them into our "fire hose" of metric observations. We will build a supervised machine learning system that will ingest an expected million anomalies per minute and make sense of them for operators. We will use statistical techniques to learn associations between anomalies, alerts and external factors. These associations will become rules in an expert system which we will build, and it will increasingly assist humans in making associations and decisions on the relationship between alerts and anomalies. We will apply unsupervised machine learning algorithms to cluster this data into incidents. Those incidents will then largely be managed by our autonomous response system and where necessary, a small number will be escalated to humans where the system will continue to learn from human actions: labeling the data so it can be modeled better.
- You are passionate about problem solving and excited to solve problems at scale. - You are an architect, a designer, a project leader, not just a programmer. - You talk directly to your customers and deliver software which delights them. - You choose the best tool and language for the problem at hand and are not zealous for any single technology. - You believe in code reviews and automated testing as a core part of writing great software. - You deploy and own your code in production. You monitor it and make it incrementally better for the benefit of your customers. - You enjoy working with your team - learning from them and helping them in equal measures. - You have a Computer Science degree, or equivalent experience and proficiency in computer science fundamentals: data structures, algorithm design and patterns. - You are an expert in at least one of the modern programming languages - Java, Python, C#, Clojure, Scala, etc. - You are highly autonomous, detail oriented and possesses strong written and oral communication skills. - You have read the Amazon Leadership Principles and you want to work in Amazon because you believe in them too.
- Experience as a high-availability service owner - Experience taking a leading role in building complex software systems that have been successfully deployed in production - Experience influencing software engineers best practices within your team - Masters in Computer Science or equivalent