Data Engineer Stockholm, Sweden

Tillgänglig ASAP

(Uppdaterat 2024-03-19)

Data Engineer

Stockholm, Sweden

Nybörjare English

  • Python
  • Airflow
  • Snowflake

Kompetenser (14)

Spark

Python

AWS

AIRFLOW

S3

Azure

Hive

Snowflake

Scala

ADLS

Azure Data Lake

PySpark

EC2

databricks

Sammanfattning

• An astute, innovative and performance focused professional with 7 years
of total experience including around 5 years of robust experience in Data
Engineering

• 5 years of exclusive experience in in Hadoop and its components like
HDFS, Spark, Hive, Hue, Sqoop, HBase, Oozie, Airflow, Snowflake.

• Expertise in developing Spark custom components using Scala and Python.

• Conversant with DataBricks platform, Cloudera, Git, Jira etc.

• Working experience in Microsoft Azure cloud services like Azure
databricks, Azure blob storage, Azure data lake.

• Good experience in using Apache Airflow and oozie to automate recurring
jobs.

• Worked with Avro, Json, parquet, sequence and ORC file types.

• Development of shell scripts for automation.

• Good knowledge on Oracle Sql, MySql.

• Assertive communicator with effective organizational and team-working
abilities

• Knowledge on AWS(S3 and EC2)

Professionell bakgrund

Data Engineer
Tata Consultancy Services

2021-03 - 2022-12

Spark, Spark-core, Spark-sql, Hive, Airflow, Snowflake, Scala, Python, Azure, Eclipse, Databricks, Notebook, Pycharm, Git, Bitbucket, Agile, Jira, Windows Building a Data Science model by sourcing data from census database and combining with Nielsen internal data to create local trade areas which will be used to generate CSV Description Key Deliverables Nov 2015 - Jan 2021 Company Client Title Technologies Description Key Deliverables Title & Duration Technologies Responsibilities reports and Snowflake tables.
Developed pyspark scripts to load data from gdb file format and feed as input to model Prepared scala functions to process the input data Load the final results of model to Databricks tables using Spark.
Designed python scripts to load database to snowflake data warehouse using api.
Automated the jobs using Airflow and built the ETL pipeline for the whole application.
Defined scripts to load GeoJson file to Azure blob.
Worked on creating Design documents for new requirements.
Data Engineer
Tech Mahindra

2019-10 - 2021-01

Spark, Spark - core, Spark SQL, Hive, Sqoop, Oozie, HDFS, Hadoop, Git, SVN, Jira, Linux, Putty, Eclipse, Scala, MySQL, Oracle The project involves collecting data from RDBMS and secured it in HDFS lake, apply data transformations and load the transformed data into external Hive tables and oracle RDBMS for reporting purpose of various downstream systems.
Involved in development and testing of various module of the product.
Developed Spark Scripts in scala to process and transform the data.
Using Sqoop to import data from RDBMS database sources into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive and loaded final data into External Hive tables for reporting purpose.
Applied partitions and bucketing concepts in Hive to optimize the performance.
Proactively solved a critical performance issue of a set of Hive Queries by various HQL optimization techniques there by reducing query time from an average 10min to 10 to 15 seconds.
Created Oozie workflows and coordinators to schedule the jobs.
Attending design calls with E2E designer and Component designer
eServe HaaS

2018-09 - 2019-09

Sep2018 - Sep 2019 Spark, Spark- core, Spark SQL, Hive, Sqoop, Oozie, HDFS, Hadoop, Git, SVN, Jira, Linux, Putty, Eclipse, Scala, MySQL, Oracle • Attending design calls with E2E designer and Component designer.
• Analyzing the requirement and preparing low level design document.
• Implemented Spark with Scala and utilizing spark core and spark SQL API for faster processing of data.
• Used Spark SQL to load the data and created RDD's and loaded the data into Hive tables and handled structured data using Spark SQL.
• Built Spark Scripts by using Scala depending on the requirement.
• Validating the input feed files and processing them using Spark and load them into External Hive tables.
• Created Oozie workflows and coordinators to schedule the jobs.
• Developed shell scripts to generate the feed files from final hive table data to send the data to downstream applications. And developed shell scripts to trigger workflows and to perform the cleanup actions automatically to reduce the manual efforts.
• Feeding input files to the application and loading data into on premise cluster.
BT Wholesale

2016-04 - 2018-09

Data (MEAS and HE product) (L2C - Lead to Cash) Responsibilities Responsible for analyzing System requirement, Business requirement, Application design documents, application interface design documents and high level design documents.
Adept at establishing excellent rapport with all levels of management and customers.
Attending CFT (Cross Functional Testing) with E2E designer and Component designer. Preparation Test case, Test Plan, Test strategy and Estimation of complexity of test cases.
Work with Design/Architecture team in understanding the feature Requirement and implementation.
Responsible for creating test cases based on the all the requirement documents, maintain the requirement traceability matrix, Preparing Test strategy for different scenario as per Release schedule and planning the Test execution.
Involved in the preparation of test cases, communicating with developers on quality issues following it with bug reports, developing, running and maintaining test Scripts.

Akademisk bakgrund

Bachelor of Technology
Jawaharlal Nehru Technological University

2011-01 - 2015-01

Certifikat

Databricks Certified Data Engineer Professional

Kontakta konsult

/