Disponible de inmediato
(Actualizado 2024-03-19.)Data Engineer
Stockholm, Sweden
Principiante English
- Python
- Airflow
- Snowflake
Habilidades (14)
Spark
Python
AWS
AIRFLOW
S3
Azure
Hive
Snowflake
Scala
ADLS
Azure Data Lake
PySpark
EC2
databricks
Resumen
• An astute, innovative and performance focused professional with 7 years
of total experience including around 5 years of robust experience in Data
Engineering
• 5 years of exclusive experience in in Hadoop and its components like
HDFS, Spark, Hive, Hue, Sqoop, HBase, Oozie, Airflow, Snowflake.
• Expertise in developing Spark custom components using Scala and Python.
• Conversant with DataBricks platform, Cloudera, Git, Jira etc.
• Working experience in Microsoft Azure cloud services like Azure
databricks, Azure blob storage, Azure data lake.
• Good experience in using Apache Airflow and oozie to automate recurring
jobs.
• Worked with Avro, Json, parquet, sequence and ORC file types.
• Development of shell scripts for automation.
• Good knowledge on Oracle Sql, MySql.
• Assertive communicator with effective organizational and team-working
abilities
• Knowledge on AWS(S3 and EC2)
Experiencia profesional
2021-03 - 2022-12
Developed pyspark scripts to load data from gdb file format and feed as input to model Prepared scala functions to process the input data Load the final results of model to Databricks tables using Spark.
Designed python scripts to load database to snowflake data warehouse using api.
Automated the jobs using Airflow and built the ETL pipeline for the whole application.
Defined scripts to load GeoJson file to Azure blob.
Worked on creating Design documents for new requirements.
2019-10 - 2021-01
Involved in development and testing of various module of the product.
Developed Spark Scripts in scala to process and transform the data.
Using Sqoop to import data from RDBMS database sources into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive and loaded final data into External Hive tables for reporting purpose.
Applied partitions and bucketing concepts in Hive to optimize the performance.
Proactively solved a critical performance issue of a set of Hive Queries by various HQL optimization techniques there by reducing query time from an average 10min to 10 to 15 seconds.
Created Oozie workflows and coordinators to schedule the jobs.
2018-09 - 2019-09
• Analyzing the requirement and preparing low level design document.
• Implemented Spark with Scala and utilizing spark core and spark SQL API for faster processing of data.
• Used Spark SQL to load the data and created RDD's and loaded the data into Hive tables and handled structured data using Spark SQL.
• Built Spark Scripts by using Scala depending on the requirement.
• Validating the input feed files and processing them using Spark and load them into External Hive tables.
• Created Oozie workflows and coordinators to schedule the jobs.
• Developed shell scripts to generate the feed files from final hive table data to send the data to downstream applications. And developed shell scripts to trigger workflows and to perform the cleanup actions automatically to reduce the manual efforts.
• Feeding input files to the application and loading data into on premise cluster.
2016-04 - 2018-09
Adept at establishing excellent rapport with all levels of management and customers.
Attending CFT (Cross Functional Testing) with E2E designer and Component designer. Preparation Test case, Test Plan, Test strategy and Estimation of complexity of test cases.
Work with Design/Architecture team in understanding the feature Requirement and implementation.
Responsible for creating test cases based on the all the requirement documents, maintain the requirement traceability matrix, Preparing Test strategy for different scenario as per Release schedule and planning the Test execution.
Involved in the preparation of test cases, communicating with developers on quality issues following it with bug reports, developing, running and maintaining test Scripts.
Formación académica
2011-01 - 2015-01