Data Architecture

Assembly Line

๐Ÿ“Š Accelerating Innovation at JetBlue Using Databricks

๐Ÿ“… Date:

โœ๏ธ Authors: Sai Ravuru, Yared Gudeta

๐Ÿ”– Topics: Data Architecture

๐Ÿญ Vertical: Aerospace

๐Ÿข Organizations: JetBlue, Databricks, Microsoft


The role of data and in particular analytics, AI and ML is key for airlines to provide a seamless experience for customers while maintaining efficient operations for optimum business goals. For a single flight, for example, from New York to London, hundreds of decisions have to be made based on factors encompassing customers, flight crews, aircraft sensors, live weather and live air traffic control (ATC) data. A large disruption such as a brutal winter storm can impact thousands of flights across the U.S. Therefore it is vital for airlines to depend on real-time data and AI & ML to make proactive real time decisions.

JetBlue has sped AI and ML deployments across a wide range of use cases spanning four lines of business, each with its own AI and ML team. The following are the fundamental functions of the business lines:

  • Commercial Data Science (CDS) - Revenue growth
  • Operations Data Science (ODS) - Cost reduction
  • AI & ML engineering โ€“ Go-to-market product deployment optimization
  • Business Intelligence โ€“ Reporting enterprise scaling and support

Each business line supports multiple strategic products that are prioritized regularly by JetBlue leadership to establish KPIs that lead to effective strategic outcomes.

Read more at Databricks Blog

Why is machine data special and what can you do with it?

๐Ÿ“… Date:

๐Ÿ”– Topics: Data Architecture

๐Ÿข Organizations: Arch Systems


Production data can unlock opportunities for electronics manufacturing service (EMS) providers to improve operations. Evolving systems for collection and analysis of machine data is vital to those efforts. Though factories produce many different types of usable data, machine data is special because it can be collected without operational burden, creating actionable production insights in real time and automating responses to them.

As more manufacturers develop and deploy machine data collection systems, industry best practices are surfacing, and systems often adopt similar structures in response to common needs in the factory. Most architectures include these key features:

  • There is usually some type of streaming event broker (often called a pub/sub architecture) that receives complex files and reports from production equipment to enable advanced analytics, holistic dashboards and visualization, automated action management, and system monitoring.
  • Systems should be able to integrate data from both advanced machines and legacy equipment, such as PLCs.
  • They use specialized databases and data lakes for storage.
  • Dedicated telemetry and monitoring are deployed to ensure data quality.

Read more at Arch Systems Blog

A Data Architecture to assist Geologists in Real-Time Operations

๐Ÿ“… Date:

โœ๏ธ Author: Nicola Lamonaca

๐Ÿ”– Topics: Data Architecture

๐Ÿญ Vertical: Petroleum and Coal

๐Ÿข Organizations: Eni, Databricks


Data plays a crucial role in making exploration and drilling operations for Eni a success all over the world. Our geologists use real-time well data collected by sensors installed on drilling pipes to keep track and to build predictive models of key properties during the drilling process.

Data is delivered by a custom dispatcher component designed to connect to a WITSML Server on all oil rigs and send time-indexed and / or depth-indexed data to any supported applications. In our case, data is delivered to Azure ADLS Gen2 in the format of WITSML files, each accompanied by a JSON file for additional custom metadata.

The visualizations generated from this data platform are used both on the oil rigs and in HQ, with operators exploring the curves enriched by the ML models as soon as theyโ€™re generated on a web application made in-house, which shows in real time how the drilling is progressing. Additionally, it is possible to explore historic data via the same application.

Read more at Medium

๐Ÿ“Š Data pools as the foundation for the smart buildings of the future

๐Ÿ“… Date:

โœ๏ธ Authors: Frederik De Meyer, Christian Metzger

๐Ÿ”– Topics: Building information modeling, Data Architecture

๐Ÿข Organizations: Siemens


Todayโ€™s digital building technology generates a huge amount of data. So far, however, this data has only been used to a limited extent, primarily within hierarchical automation systems. Data however is key to the new generation of modern buildings, making them climate-neutral, energy- and resource-efficient, and at some point autonomous and self-maintaining.

More straightforward is the use of digital solutions for building management by planners, developers, owners, and operators of new buildings. The creation of a building twin must be defined and implemented as a BIM goal. At the heart of it is a Common Data Environment (CDE), a central digital repository where all relevant information about a building can be stored and shared already in the project phase. CDE is a part of the BIM process and enables collaboration and information exchange between the different stakeholders of the construction project.

Beyond the design and construction phases, a CDE can also in the operation phase help make building maintenance more effective by providing easy access to essential information about the building and its technical systems. If information about equipment, sensors, their location in the building, and all other relevant components is collected in a machine-readable form from the beginning of the lifecycle and updated continuously, building management tools can access this data directly during the operations phase, thus avoiding additional effort. The exact goal is to collect data without additional effort. To achieve this, in the future engineering and commissioning tools must automatically store their results in the common twin, making reengineering obsolete.

Read more at Siemens Blog

๐Ÿง  How a Data Fabric Gets Snow Tires to a Store When You Need Them

๐Ÿ“… Date:

โœ๏ธ Author: Susan Hall

๐Ÿ”– Topics: Supply Chain Control Tower, Data Architecture

๐Ÿข Organizations: American Tire Distributors, Promethium


โ€œWe were losing sales because the store owners were unable to answer the customersโ€™ questions as to when exactly they would have the product in stock,โ€ said Ehrar Jameel, director of data and analytics at ATD. The company didnโ€™t want frustrated customers looking elsewhere. So he wanted to create what he called a โ€œsupply chain control towerโ€ for data just like the ones at the airport.

โ€œI wanted to give a single vision, a single pane of glass for the business, to just put in a SKU number and be able to see where that product is in the whole supply chain โ€”not just the supply chain, but in the whole value chain of the company. ATD turned to Promethium, which provides a virtual data platform automating data management and governance across a distributed architecture with a combination of data fabric and self-service analytics capabilities.

Itโ€™s built on top of the open source SQL query engine Presto, which allows users to query data wherever it resides. It normalizes the data for query into an ANSI-compliant standard syntax, whether it comes from Oracle, Google BigQuery, Snowflake or wherever. It integrates with other business intelligence tools such as Tableau and can be used to create data pipelines. It uses natural language processing and artificial intelligence plus something it calls a โ€œreasonerโ€ to figure out, based on what you asked, what youโ€™re really trying to do and the best data to answer that question.

Read more at The New Stack

A Deeper Look Into How SAP Datasphere Enables a Business Data Fabric

๐Ÿ“… Date:

โœ๏ธ Author: Juergen Mueller

๐Ÿ”– Topics: Partnership, Data Architecture

๐Ÿข Organizations: SAP, Databricks, Collibra, Confluent, DataRobot


SAP announced the SAP Datasphere solution, the next generation of its data management portfolio, which gives customers easy access to business-ready data across the data landscape. SAP also introduced strategic partnerships with industry-leading data and AI companies โ€“ Collibra NV, Confluent Inc., Databricks Inc. and DataRobot Inc. โ€“ to enrich SAP Datasphere and allow organizations to create a unified data architecture that securely combines SAP software data and non-SAP data.

SAP Datasphere, and its open data ecosystem, is the technology foundation that enables a business data fabric. This is a data management architecture that simplifies the delivery of an integrated, semantically rich data layer over underlying data landscapes to provide seamless and scalable access to data without duplication. Itโ€™s not a rip-and-replace model, but is intended to connect, rather than solely move, data using data and metadata. A business data fabric equips any organization to deliver meaningful data to every data consumer โ€” with business context and logic intact. As organizations require accurate data that is quickly available and described with business-friendly terms, this approach enables data professionals to permeate the clarity that business semantics provide throughout every use case.

Read more at SAP News

Rolls-Royce Civil Aerospace keeps its Engines Running on Databricks Lakehouse

Our connected future: How industrial data sharing can unite a fragmented world

๐Ÿ“… Date:

โœ๏ธ Author: Peter Herweck

๐Ÿ”– Topics: Manufacturing Analytics, Data Architecture

๐Ÿข Organizations: AVEVA


The rapid and effective development of the coronavirus vaccines has set a new benchmark for todayโ€™s industriesโ€“but it is not the only one. Increasingly, savvy enterprises are starting to share industrial data strategically and securely beyond their own four walls, to collaborate with partners, suppliers and even customers.

Worldwide, almost nine out of 10 (87%) business executives at larger industrial companies cite a need for the type of connected data that delivers unique insights to address challenges such as economic uncertainty, unstable geopolitical environments, historic labor shortages, and disrupted supply chains. In fact, executives report in a global study that the most common benefits of having an open and agnostic information-sharing ecosystem are greater efficiency and innovation (48%), increasing employee satisfaction (45%), and staying competitive with other companies (44%).

Read more at AVEVA Perspectives

How Corning Built End-to-end ML on Databricks Lakehouse Platform

๐Ÿ“… Date:

โœ๏ธ Author: Denis Kamotsky

๐Ÿ”– Topics: MLOps, Quality Assurance, Data Architecture

๐Ÿข Organizations: Corning, Databricks, AWS


Specifically for quality inspection, we take high-resolution images to look for irregularities in the cells, which can be predictive of leaks and defective parts. The challenge, however, is the prevalence of false positives due to the debris in the manufacturing environment showing up in pictures.

To address this, we manually brush and blow the filters before imaging. We discovered that by notifying operators of which specific parts to clean, we could significantly reduce the total time required for the process, and machine learning came in handy. We used ML to predict whether a filter is clean or dirty based on low-resolution images taken while the operator is setting up the filter inside the imaging device. Based on the prediction, the operator would get the signal to clean the part or not, thus reducing false positives on the final high-res images, helping us move faster through the production process and providing high-quality filters.

Read more at Databricks Blog

How to pull data into Databricks from AVEVA Data Hub