Lead Data Engineer (Azure+Python+Bigdata)

Bengaluru, Karnataka, India

Job Type

Full Time

About the Role

Responsibilities
• Create and maintain optimal data pipeline architecture; assemble large, complex data sets that meet functional / non-functional requirements.
• Design the right schema to support the functional requirement and consumption patter.
• Design and build production data pipelines from ingestion to consumption.
• Create necessary preprocessing and postprocessing for various forms of data for training/ retraining and inference ingestions as required.
• Create data visualization and business intelligence tools for stakeholders and data scientists for necessary business/ solution insights.
• Identify, design, and implement internal process improvements: automating manual data processes, optimizing data delivery, etc.
• Ensure our data is separated and secure across national boundaries through multiple data centers Requirements and Skills
• You should have a bachelors or master’s degree in computer science, Information Technology or other quantitative fields
• You should have at least 8 years working as a data engineer in supporting large data transformation initiatives related to machine learning, with experience in building and optimizing pipelines and data sets
• Strong analytic skills related to working with unstructured datasets.
• Experience with Azure cloud services, ADF, ADLS, HDInsight, Data Bricks, App Insights etc
• Experience in handling ETL’s using Spark.
• Experience with object-oriented/object function scripting languages: Python, Pyspark, etc.
• Experience with big data tools: Hadoop, Spark, Kafka, etc.
• Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
• You should be a good team player and committed for the success of team and overall project.

Requirements

Responsibilities

· Create and maintain optimal data pipeline architecture; assemble large, complex data sets that meet functional / non-functional requirements.

· Design the right schema to support the functional requirement and consumption patter.

· Design and build production data pipelines from ingestion to consumption.

· Create necessary preprocessing and postprocessing for various forms of data for training/ retraining and inference ingestions as required.

· Create data visualization and business intelligence tools for stakeholders and data scientists for necessary business/ solution insights.

· Identify, design, and implement internal process improvements: automating manual data processes, optimizing data delivery, etc.

· Ensure our data is separated and secure across national boundaries through multiple data centers Requirements and Skills

· You should have a bachelors or master’s degree in computer science, Information Technology or other quantitative fields

· You should have at least 8 years working as a data engineer in supporting large data transformation initiatives related to machine learning, with experience in building and optimizing pipelines and data sets

· Strong analytic skills related to working with unstructured datasets.

· Experience with Azure cloud services, ADF, ADLS, HDInsight, Data Bricks, App Insights etc

· Experience in handling ETL’s using Spark.

· Experience with object-oriented/object function scripting languages: Python, Pyspark, etc.

· Experience with big data tools: Hadoop, Spark, Kafka, etc.

· Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.

· You should be a good team player and committed for the success of team and overall project.

About the Company

Apply Now