Data Architecture
Assembly Line
๐ Accelerating Innovation at JetBlue Using Databricks
The role of data and in particular analytics, AI and ML is key for airlines to provide a seamless experience for customers while maintaining efficient operations for optimum business goals. For a single flight, for example, from New York to London, hundreds of decisions have to be made based on factors encompassing customers, flight crews, aircraft sensors, live weather and live air traffic control (ATC) data. A large disruption such as a brutal winter storm can impact thousands of flights across the U.S. Therefore it is vital for airlines to depend on real-time data and AI & ML to make proactive real time decisions.
JetBlue has sped AI and ML deployments across a wide range of use cases spanning four lines of business, each with its own AI and ML team. The following are the fundamental functions of the business lines:
- Commercial Data Science (CDS) - Revenue growth
- Operations Data Science (ODS) - Cost reduction
- AI & ML engineering โ Go-to-market product deployment optimization
- Business Intelligence โ Reporting enterprise scaling and support
Each business line supports multiple strategic products that are prioritized regularly by JetBlue leadership to establish KPIs that lead to effective strategic outcomes.
Why is machine data special and what can you do with it?
Production data can unlock opportunities for electronics manufacturing service (EMS) providers to improve operations. Evolving systems for collection and analysis of machine data is vital to those efforts. Though factories produce many different types of usable data, machine data is special because it can be collected without operational burden, creating actionable production insights in real time and automating responses to them.
As more manufacturers develop and deploy machine data collection systems, industry best practices are surfacing, and systems often adopt similar structures in response to common needs in the factory. Most architectures include these key features:
- There is usually some type of streaming event broker (often called a pub/sub architecture) that receives complex files and reports from production equipment to enable advanced analytics, holistic dashboards and visualization, automated action management, and system monitoring.
- Systems should be able to integrate data from both advanced machines and legacy equipment, such as PLCs.
- They use specialized databases and data lakes for storage.
- Dedicated telemetry and monitoring are deployed to ensure data quality.
A Data Architecture to assist Geologists in Real-Time Operations
Data plays a crucial role in making exploration and drilling operations for Eni a success all over the world. Our geologists use real-time well data collected by sensors installed on drilling pipes to keep track and to build predictive models of key properties during the drilling process.
Data is delivered by a custom dispatcher component designed to connect to a WITSML Server on all oil rigs and send time-indexed and / or depth-indexed data to any supported applications. In our case, data is delivered to Azure ADLS Gen2 in the format of WITSML files, each accompanied by a JSON file for additional custom metadata.
The visualizations generated from this data platform are used both on the oil rigs and in HQ, with operators exploring the curves enriched by the ML models as soon as theyโre generated on a web application made in-house, which shows in real time how the drilling is progressing. Additionally, it is possible to explore historic data via the same application.
๐ Data pools as the foundation for the smart buildings of the future
Todayโs digital building technology generates a huge amount of data. So far, however, this data has only been used to a limited extent, primarily within hierarchical automation systems. Data however is key to the new generation of modern buildings, making them climate-neutral, energy- and resource-efficient, and at some point autonomous and self-maintaining.
More straightforward is the use of digital solutions for building management by planners, developers, owners, and operators of new buildings. The creation of a building twin must be defined and implemented as a BIM goal. At the heart of it is a Common Data Environment (CDE), a central digital repository where all relevant information about a building can be stored and shared already in the project phase. CDE is a part of the BIM process and enables collaboration and information exchange between the different stakeholders of the construction project.
Beyond the design and construction phases, a CDE can also in the operation phase help make building maintenance more effective by providing easy access to essential information about the building and its technical systems. If information about equipment, sensors, their location in the building, and all other relevant components is collected in a machine-readable form from the beginning of the lifecycle and updated continuously, building management tools can access this data directly during the operations phase, thus avoiding additional effort. The exact goal is to collect data without additional effort. To achieve this, in the future engineering and commissioning tools must automatically store their results in the common twin, making reengineering obsolete.
๐ง How a Data Fabric Gets Snow Tires to a Store When You Need Them
โWe were losing sales because the store owners were unable to answer the customersโ questions as to when exactly they would have the product in stock,โ said Ehrar Jameel, director of data and analytics at ATD. The company didnโt want frustrated customers looking elsewhere. So he wanted to create what he called a โsupply chain control towerโ for data just like the ones at the airport.
โI wanted to give a single vision, a single pane of glass for the business, to just put in a SKU number and be able to see where that product is in the whole supply chain โnot just the supply chain, but in the whole value chain of the company. ATD turned to Promethium, which provides a virtual data platform automating data management and governance across a distributed architecture with a combination of data fabric and self-service analytics capabilities.
Itโs built on top of the open source SQL query engine Presto, which allows users to query data wherever it resides. It normalizes the data for query into an ANSI-compliant standard syntax, whether it comes from Oracle, Google BigQuery, Snowflake or wherever. It integrates with other business intelligence tools such as Tableau and can be used to create data pipelines. It uses natural language processing and artificial intelligence plus something it calls a โreasonerโ to figure out, based on what you asked, what youโre really trying to do and the best data to answer that question.
A Deeper Look Into How SAP Datasphere Enables a Business Data Fabric
SAP announced the SAP Datasphere solution, the next generation of its data management portfolio, which gives customers easy access to business-ready data across the data landscape. SAP also introduced strategic partnerships with industry-leading data and AI companies โ Collibra NV, Confluent Inc., Databricks Inc. and DataRobot Inc. โ to enrich SAP Datasphere and allow organizations to create a unified data architecture that securely combines SAP software data and non-SAP data.
SAP Datasphere, and its open data ecosystem, is the technology foundation that enables a business data fabric. This is a data management architecture that simplifies the delivery of an integrated, semantically rich data layer over underlying data landscapes to provide seamless and scalable access to data without duplication. Itโs not a rip-and-replace model, but is intended to connect, rather than solely move, data using data and metadata. A business data fabric equips any organization to deliver meaningful data to every data consumer โ with business context and logic intact. As organizations require accurate data that is quickly available and described with business-friendly terms, this approach enables data professionals to permeate the clarity that business semantics provide throughout every use case.
Rolls-Royce Civil Aerospace keeps its Engines Running on Databricks Lakehouse
Our connected future: How industrial data sharing can unite a fragmented world
The rapid and effective development of the coronavirus vaccines has set a new benchmark for todayโs industriesโbut it is not the only one. Increasingly, savvy enterprises are starting to share industrial data strategically and securely beyond their own four walls, to collaborate with partners, suppliers and even customers.
Worldwide, almost nine out of 10 (87%) business executives at larger industrial companies cite a need for the type of connected data that delivers unique insights to address challenges such as economic uncertainty, unstable geopolitical environments, historic labor shortages, and disrupted supply chains. In fact, executives report in a global study that the most common benefits of having an open and agnostic information-sharing ecosystem are greater efficiency and innovation (48%), increasing employee satisfaction (45%), and staying competitive with other companies (44%).
How Corning Built End-to-end ML on Databricks Lakehouse Platform
Specifically for quality inspection, we take high-resolution images to look for irregularities in the cells, which can be predictive of leaks and defective parts. The challenge, however, is the prevalence of false positives due to the debris in the manufacturing environment showing up in pictures.
To address this, we manually brush and blow the filters before imaging. We discovered that by notifying operators of which specific parts to clean, we could significantly reduce the total time required for the process, and machine learning came in handy. We used ML to predict whether a filter is clean or dirty based on low-resolution images taken while the operator is setting up the filter inside the imaging device. Based on the prediction, the operator would get the signal to clean the part or not, thus reducing false positives on the final high-res images, helping us move faster through the production process and providing high-quality filters.