Wang, Daoyuan
Cellphone:+86-152-1686-1267
E-mail:me at daoyuan.wang
1 EDUCATION
Bachelor of Engineering, Zhejiang University, Hangzhou, China Oct 2009 to
Jun 2013
Enrolled in HE Zhijun Honored Class and graduated with a HE Zhijun Certification.
Won Student of Excellence in 2011-2012.
Involved in GIVE Lab during undergradute. Major: Computer Science
GPA 3.78∕4.0(Overall) 3.84∕4.0(Last 2 year)
2 WORK EXPERIENCE
Senior Software Engineer at Alibaba Inc Jun 2018 to
undefined
Daoyuan works as senior software engineer at Alibaba Inc. He is part of the EMR team,
providing big data solutions to customers of Aliyun (aka Alibaba Cloud).
Senior Software Engineer at Intel Jul 2013 to Jun
2018
After Daoyuan’s graduation, he joined Big Data Technologies organization of Intel SSG. He
works on optimization of Hadoop/Spark eco-system software, developing new features and
improving performance for typical workloads.
- Project Panthera ASE Jul 2013 to Feb
2014
An approach to support PL/SQL on Apache HIVE, to provide support for legacy
queries with Intel Distribution for Hadoop(IDH). The project was developed by
three engineers including Daoyuan, source code is available at Github. Daoyuan
redesigned the basic processing structure of co-related subquery un-nesting,
improving the passing rate of queries a lot. He committed a lot of code to this
project involving bug fixing and new features such as error tracing. It was part of
Intel Distribution for Hadoop(IDH).
- HiBench March 2014 to Dec
2014
Originated in Big Data Technologies, HiBench is a widely adopted big data
benchmark suite in the industry. Before Daoyuan’s contributions, HiBench could
only work on MapReduce v1 of Apache Hadoop. Daoyuan fixed a lot of
compatibility issues for different Hadoop versions, and finally made it support
MapReduce v2 of Apache Hadoop (Hadoop Yarn), with the official release of
HiBench 3.0. He then maintained this project till the end of 2014, answering
question from community. This project is open-source, available at Github.
- Apache Spark Apr 2014 to Sept 2016
Apache Spark is a fast and general engine for large-scale data processing.
Apache Spark has become a widely used platform for computing in both clusters
and clouds. Daoyuan contributed a lot of code to Apache Spark, mainly focused
on Spark SQL, including new features, bug fixes and performance optimization
for Intel Architecture.
- Intel OAP (codename: Spinach) Feb 2017 to
Now
OAP is the efforts from Intel BDT for ad-hoc queries support on Spark SQL. A
lot of companies have adopted Apache Spark as their default analysis engine
for large volume of data, and data scientist may require the analysis platform
as a real-time query engine. While Spark is not designed for ad-hoc queries,
OAP is the spark package to unlock the power of hardware, accelerating query
execution using mechanism like indexing and caching, as well as unlock the
power of emerging hardware devices.
Baidu, one of the largest internet companies in the world, has been using OAP
in their real-time analysis engine in production for advertisement strategies. And
they reported a 1.5x-5x performance gain using OAP.
This project is open-sourced from Jun 2017. Daoyuan is a lead developer of this
project.
Software Engineer Intern at Intel Sept 2012 to Dec
2012
Daoyuan joined Intel IT in 2012 as software engineer intern. He worked on the project of
TAS(Transcode as a service) on Hadoop, providing transcoding service using Hadoop
platform and Intel hardware transcoding technologies. He successfully traced down a BSOD
bug, which was later identified as a bug in the driver program of Intel Graphics. He also
developed a thorough test framework for the project, to enable automatic regression tests.
- Won 2nd place on Intel SWPC China 2012.
3 COMPUTER
SKILLS
◇ Languages & Software: Spark, Hive, Hadoop, Scala, JAVA, Python, SQL, C/C++,
C#, PHP, R, Ruby, BASIC, Assembly, JavaScript, Linux Shell, Windows Shell, Verilog HDL,
Matlab, OpenCV, OpenGL, LATEX, MS Office.
4 AWARDS
- Third prize(the 19th place) in the 11th Zhejiang University Programming
Contests
- Third prize(the 10th place) in the 12th Zhejiang University Programming
Contests
- Intel SSG 2017 Q3 Group Recognition Award
5 TRANSLATION
Learning Spark: Lightning-fast Data Analysis (Chinese version) O’Reilly
Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark (Chinese
version) O’Reilly
6 PUBLIC TALKS
Tuning Garbage Collection for Spark Applications
QCon Shanghai 2015, Shanghai, Oct 2015
Spinach: Run Ad-hoc Queries on top of Spark SQL
DTCC 2017, Beijing, May 2017
OAP: Optimized Analytics Package for Spark Platform
Spark Summit 2017, San Francisco, Jun 2017
OAP: Optimized Analytics Package for Spark Platform
Strata Beijing 2017, Beijing, Jul 2017