I started playing online-judge since I was a high school student first time learning C++. At that time, UVA Online Judge was the most popular one. The system offers problem sets related to algorithm, data structure, number theory, and other fundamental knowledge about programming. It is fun and actually make your brain bigger.
After graduated from college, I programed more on data storage, business logic, application interface and almost have no time to play online judge. Recently, to prepare job interviews, I start playing online judge again and find it still joyful.
The community is getting bigger and advanced. For example, HackerRank has a discussion fourm and supports almost all modern languages. There are also many varieties:
Having worked at EZTABLE for three years. The company is quite successful in Asia and the engineering team growed from two people to almost fifty people. There is no system administrator nor devops. I spent 5% of the time on the devops job.
The following are some notes I took on the server architecture and components used. Althought not perfect, it works and actually generates revenue.
AWS EC2. Keep most of them in us-east-1d to reduce cross-region data transfer fee. Having one DB Slave in us-east-1b to recover from region crash.
Currently not using VPC. There will be performance and security issue. Try to use VPC in the future.
Shared File System
If need random access, use NFS.
If need cheap data archiving, use AWS Glacier.
Currently use Godaddy, try to migrate to Route 53 for better control.
Content Delivery Network (CDN)
AWS Cloudfront. SSL support for Cloudfront will cost you $600 USD per year. As a result, use the configuration file and the following for static files on CDN to support both http and https.
This can be done in AssetPipeline to support both local development and production.
If we really need SSL support with custom domain name, use Nginx as the reverse proxy for S3 static hosting.
Legacy design use static-host.eztable.com as the origin for CloudFront. However, modern design like ImageService use S3 as origin. Try to use S3 as origin as much as possible to ease the deployment tasks.
Cluster, Data Processing
AWS Elastic MapReduce. One medium instance for each MASTER and CORE group with abitrary number of spot large instances would be enough for current data-scale.
Scribe is not actively maintained these years. However, it is still a solid choice. (Facebook use the same code in their production servers.). Make sure scribed process in job001 is always alive, otherwise buffer servers harddisk will blow up.
Flume might be a better choice since its actively maintained and can be integrated with many other components.
Node.js Socket.io Server
Combining with Redis pub/sub, this provide us solid real-time messaging.
Software architect is hard not only because its complexity but also that people seldom have chance to try & error and evlove a system. The best way to become a software architect is to design and implement software. The book explains not only how but also why to make certain design decisions.
Thanks for the AOSA book, although not familiar with all the software mentioned, I do learn a lot from it.
View IP Address and Hardware / MAC address assign to interface and also MTU size.
Set MTU size. The maximum transmission unit (MTU) of a communications protocol of a layer is the size (in bytes) of the largest protocol data unit that the layer can pass onwards.
Set promiscuous mode. Promiscuous mode causes the controller to pass all traffic it receives to the central processing unit (CPU) rather than passing only the frames that the controller is intended to receivem Normally used for packet sniffing.
Show all NICs including disabled ones.
Enable a specific interface.
Disable a specific interface.
View, set speed and duplex for NIC.
ifconfig for wireless.
View ARP(Address Resolution Protocol) table.
Send ICMP ECHO_REQUEST packet to network hosts.
ping in parallel. Unlike ping, fping is meant to be used in scripts and its output is easy to parse.
Shows number of hops taken to reach destination as well as determine packets traveling path.
traceroute using TCP packets rather than ICMP Echo Requests and Replies.
mtr combines the functionality of the traceroute and ping programs in a single network diagnostic tool.
Shows and manipulate ip routing table.
Status / Monitoring / Packet
Display connection info.
Display routing table.
Displaying service name with PID.
Display promiscuous mode and refresh every 5 seconds.
iptraf / iptraf-ng
TCP and UDP traffic statistics.
Query DNS related information like A Record, CNAME, and MX Record. In dnsutils debian package.
DNS reverse lookup.
Query DNS related information.
Query all available DNS records.
Find name to IP or IP to name in IPv4 or IPv6 and also query DNS records. Use -t option to find out DNS Resource Records like CNAME, NS, MX, and SOA.
Get hostname of the machine.
Check the internic database for proper hostnames.
Server / Client.
Text web browser.
Text web browser.
Broadcast message to all logged in users.
Send message to a specific user in the specified tty.
The most important thing of running a webservice is to keep the service 24/7 available. However, things always happen in unexpected ways. Having a monitoring system improves the response time when the service operates abnormally. Having a logging system allows DevOps and software developers to foresee the problems before it happens.
If you are going to run a webserviice, I would suggest you to log everything at day 1.
What is a Good Logging System?
When choosing a logging system, I would always consider the following features:
Capability of logging from different programming languages and sources.
Integration with storage components like S3 and HDFS.
Those solutions support at least one of the popular RPC libraries like Thrift, or have very good integration with language's logging frameworks like Log4j or Monolog. Being able to store logs in various storage system and a distributed architecture are the for-sure features.
Scribe, Open Sourced by Facebook
Facebook open sourced Scribe in 2008. As the following figure, the architecture is simple tree model.
Running a Scribe deamon on each server. Applications use Thrift to communicate with Scribe server on localhost. The local Scribe deamon buffered the logs and forward to the upstream Scribe server. Finally the central Scribe server append the logs to the filesystem.
I was a big fan of RSS since 2006. As Google Reader ends its service on July 1st, I keep looking for alternatives. After trying The Old Reader and Feedly, I found Digg Reader is the one I get most used to.
The interface and keyboard shortcut is almost the same as Google Reader. With Instapaper integration and a pretty good iOS App, I am quite comfortable with the transition.