York's Blog

Enjoy the Online Judge

| Comments

An online judge is an online system to test programs in programming contests. They are also used to practice for such contests. Many of these systems organize their own contests.

The system can compile and execute codes, and test them with pre-constructed data. Submitted code maybe run with restrictions, including time limit, memory limit, security restriction and so on. The output of the code will be captured by the system, and compared with the standard output. The system will then return the result. When mistakes were found in a standard output, rejudgement using the same method must be made.


I started playing online-judge since I was a high school student first time learning C++. At that time, UVA Online Judge was the most popular one. The system offers problem sets related to algorithm, data structure, number theory, and other fundamental knowledge about programming. It is fun and actually make your brain bigger.

Some other interesting online judge sites are:

After graduated from college, I programed more on data storage, business logic, application interface and almost have no time to play online judge. Recently, to prepare job interviews, I start playing online judge again and find it still joyful.

The community is getting bigger and advanced. For example, HackerRank has a discussion fourm and supports almost all modern languages. There are also many varieties:

Thanks to all the online judge communities, you really make the world better.

Learning a Programming Language: PHP as an Example

| Comments

The following are steps when I learn a programming language. For new enginners, I also suggested them to do so. Lets take PHP as an example.

Offical tutorial, language reference, and function references

We can find all of them on the language's office website.

  • Tutorial gives us an overview of the language.
  • Language Reference helps us understand syntax and the language features.
  • Scanning the Function Reference helps us understand the built-in functions so we can reference them when we really need them.

Coding style and common idioms

The community usually has a standard coding style an common idioms. If cannot find it on the official website, try Stack Overflow

We usually not only know the best practices, but also find some links to other tools in the ecosystem in those guides.

Common tools and the ecosystem.

There are common tasks while developing an application, like testing, debugging, building, deployment, and package management.

We might also want to find how those tools can be integrated with our IDE and continuous integration server.

Extensions, frameworks and libraries

Sometimes the feature you need like ImageMagick or Mcrypt are already built-in. All you have to do is to make sure the module is enabled in your binary. Sometimes they are not.

For some language like PHP, we should also understand how language extension works.

However, do not reinvent the wheel. Depends on our application needs, try to survey and get the most suitable framework/libraries.

Setup in development and production environment

Get the latest security updates from a trusted source.

A Startup's Server Architecture

| Comments

Having worked at EZTABLE for three years. The company is quite successful in Asia and the engineering team growed from two people to almost fifty people. There is no system administrator nor devops. I spent 5% of the time on the devops job.

The following are some notes I took on the server architecture and components used. Althought not perfect, it works and actually generates revenue.


AWS EC2. Keep most of them in us-east-1d to reduce cross-region data transfer fee. Having one DB Slave in us-east-1b to recover from region crash.

Currently not using VPC. There will be performance and security issue. Try to use VPC in the future.

Shared File System


If need random access, use NFS.

If need cheap data archiving, use AWS Glacier.


Currently use Godaddy, try to migrate to Route 53 for better control.

Content Delivery Network (CDN)

AWS Cloudfront. SSL support for Cloudfront will cost you $600 USD per year. As a result, use the configuration file and the following for static files on CDN to support both http and https.

<img src="//d1gpbxqmt7wq2i.cloudfront.net/image.jpg" />

This can be done in AssetPipeline to support both local development and production.

If we really need SSL support with custom domain name, use Nginx as the reverse proxy for S3 static hosting.

Legacy design use static-host.eztable.com as the origin for CloudFront. However, modern design like ImageService use S3 as origin. Try to use S3 as origin as much as possible to ease the deployment tasks.

Cluster, Data Processing

AWS Elastic MapReduce. One medium instance for each MASTER and CORE group with abitrary number of spot large instances would be enough for current data-scale.




MySQL, Percona distribution. Use InnoDB storage engine to support master-master replication in the future. If really need support, buy their service.

AWS RDS would be a secondary choose since it is much more costly to get the same functionality.

For scaling issues, try to apply the following solutions:

  • Consider re-design the data model or not to use database.
  • High io EBS and raid 10.
  • More powerful instance type.
  • Table partitioning.
  • Use different server for vertical databases.
  • Table sharding.
  • Use NoSQL solution like MongoDB, HBase, or AWS Dynamo with carefully analysis.

Cache, Memory Storage





InnoDB does not support full-text search.

Solr. Currently use 3.2, but the latest stable version is 4.0. Could not migrate because native PHP extension does not support 4.x. Migrate to the Solarium client would solve the problem.



Scribe is not actively maintained these years. However, it is still a solid choice. (Facebook use the same code in their production servers.). Make sure scribed process in job001 is always alive, otherwise buffer servers harddisk will blow up.

Flume might be a better choice since its actively maintained and can be integrated with many other components.

Node.js Socket.io Server

Combining with Redis pub/sub, this provide us solid real-time messaging.

Load Balancer


Web Server

  • PHP: Apache2 with mod-php5
  • Static Files: Nginx, S3, CDN
  • Node.js

You might want to use php-fpm to replace apache2 for better performance.

My Mac Toolbox 2013

| Comments

A memo about software on my mac.

AirServer is an AirPlay receiver, similar to Apple TV. So I can mirror display from my iOS devices.

Caffeine temporarily disable the sleep feature and screen savers. So I can do presentation smoothly.

MusicBrainz Picard uses an album-oriented approach with audio fingerprints to correct the music file tags.

Vox is the best music player on osx IMO.

MPlayerX. Mplayer is the best media player on all platform.

cocountBattery checks current battery health.

Cyberduck is a FTP, SFTP, WebDAV, S3, Google Cloud Storage, and everything you can think about storage client.

Filezilla is a great FTP, SFTP client.

TweetDeck, best Twitter client ever.

TinkerTool access to additional preference settings Apple has built into OS X. I need it to tune the font size.

iTerm2, everyone need a terminal. Much much better than the default one.

Welly is my favorite BBS client.

Sublime Text 2 is my favorite text editor in addition to vim.

LibreOffice. Sometimes we still have to open docx files. I also use LibreOffice to make my presentation slides.

Dropbox. Simply cannot live without it.

Evernote. My second brian. =)

VirtualBox. I done most my development on Linux environment.

Xcode. I also develop apps.

f.lux makes the color of your computer's display adapt to the time of day, warm at night and like sunlight during the day.

Spotify. Nice music streaming.

Divvy. Great window manager.

The Unarchiver.

XQuartz is the X.Org X Window System that runs on OS X. I run some Linux application on mac, most of them are installed by Homebrew. For example: Meld, a great diff and merge tool.

Linux System Diagnostic Tools

| Comments

Show utilization of each CPU individually.

mpstat -P ALL

Sampling CPU usage in 2 seconds frame for 5 times

sar -u 2 5

Can also be found in logs

sar - f /var/log/sysstat/sa09

See who is eating CPU

ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10

Show I/O stats every 5 seconds

iostat -xtc 5

Show resource stats using megabyte as unit

vmstat -S M 2 5

Show NFS stats


Find memory used by a process using pmap

pmap <pid>

Display sockets summary

ss -s

Show open sockets with process name

ss -pl

show all ports connected from remote port

ss dst
ss dst 
ss dst

Find out all ips connected to ip address x.x.x.x, where x.x.x.x is the machine's ip address

ss src x.x.x.x

Show all connections

netstat -nat

Detect DDoS attack

netstat -atun | awk '{print $5}' | cut -d: -f1 | sed -e '/^$/d' |sort | uniq -c | sort -n

Summary statistics by protocol

netstat -s

View system call stacks of a process

strace -o output.txt /bin/foo
strace -p 12345 -s 80 -o /tmp/debug.httpd.txt
strace -e trace=open,read -p 22254 -s 80 -o debug.http.txt

Book Recommendation: The Architecture of Open Source Applications

| Comments

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another's mistakes rather than building on one another's successes.

Our goal is to change that. In these two books, the authors of four dozen open source applications explain how their software is structured, and why. What are each program's major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to these books provide unique insights into how they think.

If you are a junior developer, and want to learn how your more experienced colleagues think, these books are the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, these books can help you too.

The Architecture of Open Source Applications

The Architecture of Open Source Applications

Software architect is hard not only because its complexity but also that people seldom have chance to try & error and evlove a system. The best way to become a software architect is to design and implement software. The book explains not only how but also why to make certain design decisions.

Thanks for the AOSA book, although not familiar with all the software mentioned, I do learn a lot from it.

Linux Networking Commands

| Comments

A quick note of linux networking commands.



View IP Address and Hardware / MAC address assign to interface and also MTU size.

Set MTU size. The maximum transmission unit (MTU) of a communications protocol of a layer is the size (in bytes) of the largest protocol data unit that the layer can pass onwards.

ifconfig eth0 mtu XXXX

Set promiscuous mode. Promiscuous mode causes the controller to pass all traffic it receives to the central processing unit (CPU) rather than passing only the frames that the controller is intended to receivem Normally used for packet sniffing.

ifconfig eth0 - promisc

Show all NICs including disabled ones.

ifconfig -a


Enable a specific interface.


Disable a specific interface.


View, set speed and duplex for NIC.


ifconfig for wireless.



View ARP(Address Resolution Protocol) table.


Send ICMP ECHO_REQUEST packet to network hosts.


ping in parallel. Unlike ping, fping is meant to be used in scripts and its output is easy to parse.


Shows number of hops taken to reach destination as well as determine packets traveling path.


traceroute using TCP packets rather than ICMP Echo Requests and Replies.


mtr combines the functionality of the traceroute and ping programs in a single network diagnostic tool.


Shows and manipulate ip routing table.

route add -net gw
route del -net gw
route add default gw

Status / Monitoring / Packet


Display connection info.

netstat -a

Display routing table.

netstat -r

Displaying service name with PID.

netstat -tp

Display promiscuous mode and refresh every 5 seconds.

netstat -ac 5 | grep tcp


Packet analyzer.

iptraf / iptraf-ng

TCP and UDP traffic statistics.



Query DNS related information like A Record, CNAME, and MX Record. In dnsutils debian package.

Query all.

dig yahoo.com ANY +noall +answer

DNS reverse lookup.

dig -x +short


Query DNS related information.

Query all available DNS records.

nslookup -query=any yahoo.com

Debug mode.

nslookup -debug yahoo.com


Find name to IP or IP to name in IPv4 or IPv6 and also query DNS records. Use -t option to find out DNS Resource Records like CNAME, NS, MX, and SOA.


Get hostname of the machine.


Check the internic database for proper hostnames.







Server / Client.

nc -l 2389
nc localhost 2389
HI, server




Text web browser.


Text web browser.



Broadcast message to all logged in users.


Send message to a specific user in the specified tty.



Configuration Files


Network configuration file.


DNS configuration.


Hostname configuration.


Local DNS.


File descriptors and other limits.

System Status












Log Everything with Scribe

| Comments

Why Logging?

The most important thing of running a webservice is to keep the service 24/7 available. However, things always happen in unexpected ways. Having a monitoring system improves the response time when the service operates abnormally. Having a logging system allows DevOps and software developers to foresee the problems before it happens.

If you are going to run a webserviice, I would suggest you to log everything at day 1.

What is a Good Logging System?

When choosing a logging system, I would always consider the following features:

  • Capability of logging from different programming languages and sources.
  • Integration with storage components like S3 and HDFS.
  • Supporting for server farm and fault recovery.

Currently, the best open source solutions are Apache Flume, fluentd, and Scribe.

Those solutions support at least one of the popular RPC libraries like Thrift, or have very good integration with language's logging frameworks like Log4j or Monolog. Being able to store logs in various storage system and a distributed architecture are the for-sure features.

Scribe, Open Sourced by Facebook

Facebook open sourced Scribe in 2008. As the following figure, the architecture is simple tree model.

      'client'                    'central'
----------------------------     --------------------
| Port 1464                 |    | Port 1463         |
|        ----------------   |    | ----------------  |
|     -> | scribe server |--|--->| | scribe server | |
|        ----------------   |    | ----------------  |
|                |          |    |    |         |    |
|            temp file      |    |    |    temp file |
|---------------------------     |-------------------
                               | /tmp/scribetest/ |

Running a Scribe deamon on each server. Applications use Thrift to communicate with Scribe server on localhost. The local Scribe deamon buffered the logs and forward to the upstream Scribe server. Finally the central Scribe server append the logs to the filesystem.

Detailed installation guide and examples are on Github. Basically you can get Scribe on production with the following steps:

1.Instell Thrift.

  1. Install FB303 from Thrift's contrib folder.
  2. Install Scribe.
  3. Write the configuration file for buffers and central. Start Scribe deamon.
  4. Generate the source code for your language from the scribe/if folder.

Log Format

I highly recommend you to store logs in JSON format. It can be processed by all languages.


I was having a hard time compiling Scribe after I upgrade the libboost package. I found this blog post solving my problem. Basically, just add the following when ./configure