Speech processing researcher

Grupo FalaBrasil

Cassio T Batista

This is a legacy website. Please refer to @cassiotbatista instead.

I have a PhD degree in Computer Science (2023) conferred by Federal University of Pará (UFPA) in Belém, Brazil. Currently, I am doing research in speech processing at Vivoka in Metz, France. My professional experience includes mostly speech recognition and machine learning.

Interests

Speech processing

Education

PhD in Computer Science, 2023

Federal University of Pará
MSc in Computer Science, 2017

Federal University of Pará
BSc in Computer Engineering, 2016

Federal University of Pará

Skills

What am I (supposed to be) good at?

Speech Recognition

Kaldi, Icefall (K2), SpeechBrain, etc.

Linux & Tools

Arch, XMonad, Vim, Git, Python, C, etc.

Machine Learning

PyTorch, Scikit-learn, ONNX, etc.

Experience

Speech processing research

Vivoka

Jun 2023 – Present Metz, France

Speech recognition

Speech processing research

CPqD

Mar 2021 – May 2023 Campinas, Brazil

Speech-based technologies:

Lattice rescoring via n-grams and neural networks LM
ASR + VAD and SER (emotion)

PhD in Computer Science

Federal University of Pará (UFPA)

Dec 2017 – Oct 2022 Belém, Brazil

Speech-based technologies:

Kaldi ASR for Brazilian Portuguese
Utterance copy TTS in English using Klatt and deep learning techniques

MSc in Computer Science

Federal University of Pará (UFPA)

Mar 2017 – Dec 2017 Belém, Brazil

A universal remote control system in C++ for people with upper-limb motor disabilities, so they could control a TV via alternative methods.

OpenCV for head gesture recognition
PocketSphinx for speech recognition
Adaptive switches in hardware

Research Internship

Embrapa

Mar 2016 – Dec 2016 Belém, Brazil

A simulator in Python for the routing and wavelength assignment (RWA) problem over transparent, wavelength-multiplexed optical networks using Genetic Algorithms.

Summer Internship

Óbuda University (OE)

Mar 2014 – Jan 2015 Budapest, Hungary

Development of speech (English) modules for controlling Teki: a personal home assistant, Turtlebot-based robot

PocketSphinx desktop on Linux + ROS (offline)
Android’s Google ASR (online Wi-Fi UDP connection)

Research Internship

Federal University of Pará (UFPA)

Jan 2012 – Feb 2016 Belém, Brazil

Development of resources and applications for spech recognition in Brazilian Portuguese:

PyQt4 CFG/BNF grammar tester for Julius
Acoustic model training on CMU Sphinx for KDE Simon Listens
Android client + Julius server vs. Google’s Android ASR

Projects

Head Remote

A system where user’s head gestures are translated into remote commands to electronic devices

Speech Remote

A remote control system that translates the user’s spoken words into commands to electronic devices

ASR BBB

Speech recognition and TV remote control using Android and BeagleBone Black

Recent Publications

Experiments on Kaldi-based Forced Phonetic Alignment for Brazilian Portuguese

Accepted for publication

Cassio Batista, Nelson Neto

Code Video

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools

Forced phonetic alignment in Brazilian Portuguese using Kaldi tools.

Ana Larissa Dias, Cassio Batista, Daniel Santana, Nelson Neto

PDF Code DOI

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools

A Parallel Strategy for a Genetic Algorithm in Routing Wavelength Assignment Problem Using GPU with CUDA

Routing and wavelength assingment simulador on NVIDIA CUDA GPUs.

Esdras La-Roque, Cassio Batista, Josivaldo Araújo

PDF Code

A Parallel Strategy for a Genetic Algorithm in Routing Wavelength Assignment Problem Using GPU with CUDA

Evaluating Alternative Interfaces Based on Puff, Electromyography and Dwell Time for Mouse Clicking

Statistical comparison among three different types of mouse click: mouth puffing, EMG and dwell-time. Two out of these three methods have been developed in hardware and their schematics been open-sourced.

Erick Campos, Denis Martins, Suzane dos Santos, Renan Cunha, Cassio Batista, Nelson Neto

Code

Evaluating Alternative Interfaces Based on Puff, Electromyography and Dwell Time for Mouse Clicking

Utterance Copy in Formant-based Speech Synthesizers Using LSTM Neural Networks

Estimating the input parameter of Klatt88 formant-based speech synthesizer with long short-term memory neural nets (LSTM).

Cassio Batista, Renan Cunha, Pedro Batista, Aldebaro Klautau, Nelson Neto

Utterance Copy in Formant-based Speech Synthesizers Using LSTM Neural Networks

See all publications

Speech processing researcher

Grupo FalaBrasil

Cassio T Batista

Interests

Education

Skills

Speech Recognition

Linux & Tools

Machine Learning

Experience

Speech processing research

Vivoka

Speech processing research

CPqD

PhD in Computer Science

Federal University of Pará (UFPA)

MSc in Computer Science

Federal University of Pará (UFPA)

Research Internship

Embrapa

Summer Internship

Óbuda University (OE)

Research Internship

Federal University of Pará (UFPA)

Recent Posts

Free Online Courses

Projects

Head Remote

Speech Remote

ASR BBB

Recent Publications

Experiments on Kaldi-based Forced Phonetic Alignment for Brazilian Portuguese

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools

A Parallel Strategy for a Genetic Algorithm in Routing Wavelength Assignment Problem Using GPU with CUDA

Evaluating Alternative Interfaces Based on Puff, Electromyography and Dwell Time for Mouse Clicking

Utterance Copy in Formant-based Speech Synthesizers Using LSTM Neural Networks

Contact