In general, a Data Scientist should have
- Strong background in Mathematics (Calculus, Linear Algebra), to understand mathematical notations and able to transform them to code
- Strong background in Probability, and Statistics (Basic probability, CI, Hypothesis Testing, A/B Testing, Regression, GLM, etc.)
- Data Structures & Algorithms: Demonstrated via one of programming languages such as Python or Java
- Big Data: Hands-on experience with HPC/AWS/GCP, and Apache Spark/Hadoop/Kafka for Big Data management, Hive/Pig for data processing/ETL
- Machine Learning & Data Mining: Knowledge of classical ML algorithms and hands-on skills with packages such as Scikit-learn, Numpy, Scipy, etc. as well as Deep Learning via TensorFlow, and Keras, and ML with Big Data using MLlib
- Databases: Relational database models; Hands-on with typical relational database software (Oracle, MySQL, MS SQL Server), and NoSQL (HBase, Cassandra, MongoDB, etc.)