Takuya Akiba

Takuya Akiba

Senior Research Scientist

Stability AI


An enthusiast of deep learning, experienced in research, engineering, and management. By blending research and engineering capabilities, I push the boundaries in AI evolution.

  • Deep Learning
  • High Performance Computing
  • Python, Rust, C/C++
  • PhD in Computer Science, 2015

    The University of Tokyo (received Dean's award; finished in 2 years)

  • M.S. in Computer Science, 2013

    The University of Tokyo (received Dean's award)

  • BSc in Artificial Intelligence, 2011

    The University of Tokyo

Experience Summary


It has been 20+ years since I started computer programming. I am the original founder of OSS ML software packages such as Optuna, ChainerMN, and Epochraft, and ranked among the top 10 in global coding competitions such as TopCoder and Google Code Jam.


I am enthusiastic about working on a wide range of problems, whether they are specific to a company or real-world issues, as well as those that are of academic interest. I have co-authored papers that have been published at top-tier academic conferences (see DBLP and Google Scholar).

Management & Leadership

My experience includes various types of management and leadership, such as leading without formal authority, serving as an engineering manager for a team, and being responsible for a group of teams consisting of 20+ SWEs.

Speciality Summary

Machine Learning (2016-present)

With 7+ years of industrial experience in ML, particularly in DL, I have extensive experience in research, building production models, and designing software packages. I am a Kaggle Grandmaster. I have co-authored papers that have been published at prestigious conferences such as NeurIPS, CVPR, and KDD. I am the creator of OSS ML software packages such as Optuna and ChainerMN. I have co-authored ML books on Kaggle and Optuna.

Algorithms and Data Structures (-2016)

I was an enthusiastic competitive programming player and achieved a maximum TopCoder rating of 3292, which was 4th in the world at that time. I have won a bronze medal at ACM ICPC World Finals 2012 and 9th place at Google Code Jam 2010. My papers on algorithms and data structures were published at SIGMOD, CIKM, WWW, AAAI, KDD, ICDM. I have co-authored a Japanese book on algorithms for competitive programmers, which has been translated and published in Korea, China, and Taiwan.


Besides my listed work experience, I have done internships at Google, Microsoft Research Asia, and Microsoft Research SVC.

Senior Research Scientist
Stability AI
June 2023 – Present Tokyo, Japan (Remote)
I started my new position at Stability AI in June 2023. I am carring out research and development on generative AI, such as LLMs and diffusion models.
VP of ML Infrastructure
Preferred Networks, Inc.
May 2018 – March 2022 Tokyo, Japan
As Preferred Networks was a fresh start-up company, my responsibilities rapidly changed over time. Finally I was responsible for the whole ML infrastructure field, which consists of four teams of around 20+ engineers and researchers: DL framework team (PyTorch and Chainer), compiler for MN-Core team, Optuna team, and HPC research team.
Preferred Networks, Inc.
July 2016 – July 2018 Tokyo, Japan
I engaged in a variety of activities related to deep learning, including research, prototyping, demonstrations, building models for customers, and developing software frameworks.
Assistant Professor
National Institute of Informatics
April 2015 – June 2016 Tokyo, Japan
I led research on practical data structures and algorithms for the analysis of large-scale graph data.


2nd place @ Google AI Open Images - Object Detection Track

I launched the PFDet project, assembled a team of six researchers and engineers, and led the project to success.

The goal of the PFDet project was to find ways to effectively utilize large-scale distributed deep learning for tasks such as object detection and instance segmentation, and to build a practical system for these tasks with ChainerMN. At the time, large-scale distributed deep learning was still in its infancy, and the methodology had not yet been established for more complex tasks such as object detection. We participated in a contest for Open Images, a massive object detection dataset, and successfully used 512 GPUs to train the model. Our model achieved the second-highest accuracy at the time. The PFDet model and system were used in internal projects.

Takuya Akiba, et al.: PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track. Open Images Challenge Workshop at ECCV 2018. (arxiv)
Yusuke Niitani, et al.: Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects. CVPR 2019. (arxiv)


I proposed the development of a new framework for hyperparameter optimization, Optuna, and performed the design and initial implementation. I also established and managed the development team.

Optuna offers several unique features that previous frameworks lacked. These features include a flexible “define-by-run” style search space description and a combination of sampling and pruning for efficient optimization. Additionally, it boasts scalability for large-scale distributed optimization, as well as simplicity, allowing users to easily experiment with optimization. Optuna has been widely adopted in internal projects and is also enthusiastically supported as open-source software outside the company. Its GitHub repo has garnered 7.3K stars, and its KDD’19 paper has been cited 1.8K times.

Takuya Akiba, et al.: Optuna: A Next-generation Hyperparameter Optimization Framework. KDD 2019. (arxiv)

ResNet50 in 15 minutes w/ 1024 GPUs

I proposed the challenge of training ResNet50 in 15 minutes using 1024 GPUs with ChainerMN, and led the team to successfully complete this challenge. This was the world’s fastest record at the time.

In order to showcase the effectiveness of ChainerMN and identify any potential limitations for future research and development, I believed it was necessary to conduct ultra-large scale parallel experiments. Facebook Research had previously set a record of 1 hour using 256 GPUs for training ResNet-50 on ImageNet. However, we surpassed this achievement by completing the task in a record-breaking 15 minutes using 1024 GPUs. As deep learning with 1024 GPUs were unprecedented at the time, it required significant effort to get things working properly, even including debugging middlewares from other companies and addressing hardware failures.

Takuya Akiba, et al.: Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes. Deep Learning on Supercomputing Worskhop at NeurIPS 2017. (arxiv)

4th place @ NIPS 2017 Adversarial Attack Competition

I formed a team and participated in the NIPS 2017 Adversarial Attack Competition. I devised a new method based on large-scale distributed deep learning, which was fundamentally different from the mainstream methods.

The mainstream method for adversarial attacks had been to perform backpropagation on the input image, calculate the gradient, and use the gradient to modify the image. I proposed a method of learning a CNN that is trined like a generative adversarial network (GAN), which takes an original image as input and outputs an adversarial example. I used 128 GPUs and ChainerMN to train this CNN, using both data parallelism and model parallelism.

Takuya Akiba, et al.: Non-Targeted Attack Track 4th Place Solution. Competition Workshop at NIPS 2017. (poster)
Alexey Kurakin, et al.: Adversarial Attacks and Defences Competition. The Springer Series on Challenges in Machine Learning. (arxiv)


I have conducted basic research on distributed parallel deep learning, and established a methodology. Then, I designed and implemented ChainerMN, which adds distributed training feature to Preferred Networks’ deep learning framework Chainer.

Back in 2016, the question of whether deep learning could be effectively parallelized on a large scale remained unanswered, with only a variety of papers and prototype implementations available. I examined these sources closely, formulated my own hypothesis, conducted experiments to establish an efficient methodology. In particular, despite async SGD being considered by many as the preferred approach at the time, I recognized early on that sync SGD could be better suited for certain tasks, such as image classification, and made it a priority to implement that method. This decision has since proven to be correct, as sync SGD is now much more widely used than async SGD. This decision of mine gave Chainer and ChainerMN a significant advantage in the scalability race for deep learning frameworks during that period.

Following this success, the HPC field became a team, and I was assigned to lead the team. In addition, the company was steered in the direction of greatly increasing computing power by building supercomputers (MN-1, MN-2, and MN-3) and dedicated ASICs called MN-Core.

Takuya Akiba, et al.: ChainerMN: Scalable Distributed Deep Learning Framework. Workshop on ML Systems at NIPS 2017. (arxiv)
Seiya Tokui, et al.: Chainer: A Deep Learning Framework for Accelerating the Research Cycle. KDD 2019. (arxiv)