This is an article VB Lab Insights presented by Capital One.
The AI offers transformer potential, but unlocking its value requires strong data management. AI is based on a solid database which can improve iteratively, creating a steering wheel effect between data and AI. This steering wheel allows companies to create more personalized solutions in real time that unlock the impact for their customers and the company.
Data management in today’s world is not without complexity. The volume of data soar, the research showing that it is doubled in the past five years only. Therefore, 68% of the data available for companies are unexploited. In these data, there is a wide variety of structures and formats, noted that 80 to 90% of the data is not structured – Provide the complexity of the bet to be used. And finally, the speed at which the data must be deployed to users accelerate. Some use cases require data availability of less than 10 milliseconds, or in other words, ten times faster than flashing with an eye.
Today’s data ecosystems are great, diverse and rapid – and the AI revolution increases the challenges more about how companies manage and use the data.
Fundamentals for large data
The data life cycle is complicated and ruthless, often involving many stages, many hops and many tools. This can lead to disparate means of working with data and at different levels of maturity and instrumentation to stimulate data management.
To give users worthy of confidence for innovation, we must first approach the fundamental principles of the management of excellent data: self-service, automation and scale.
- Self-service means allowing users to do their job with a minimum of friction. It covers areas such as the discovery of transparent data, the ease of data production and the tools that democratize access to data.
- Automation ensures that all basic data management capacities are integrated into tools and experiences that allow users to work with data.
- Data ecosystems must evolve – especially in the AI era. Among other considerations, companies must take into account the scalability of certain technologies, resilience and service agreements that define the reference obligations on how data must be managed (as well as the application mechanisms these agreements).
These principles set the foundations for producing and consuming large data.
Produce excellent data
Data producers are responsible for the integration and organization of data, allowing rapid and efficient consumption. A well -designed self -service portal can play a key role here by allowing producers to interact transparently with systems through ecosystem – such as storage, access controls, approvals, versioning and version Commercial catalogs. The objective is to create a unified control plan that reduces the complexity of these systems, making the data available in the right format, at the right time and in the right place.
To scale up and apply governance, companies can choose between a central platform and a federated model – or even adopt a hybrid approach. A central platform simplifies the data and governance publication rules, while a federated model offers flexibility, locally using SDKs built to the objective to manage governance and infrastructure. The key is to implement coherent mechanisms that ensure automation and scalability, allowing the company to reliably produce high -quality data that feeds AI innovation.
Consume excellent data
Data consumers – such as data scientists and data engineers – need easy access to reliable and high quality data for rapid experimentation and development. Simplification of the storage strategy is a fundamental step. By centralizing the calculation in Lake Data and using a single storage layer, companies can minimize the spread of data and reduce complexity by allowing calculation motors to consume data from a single storage layer.
Companies should also adopt an area strategy to manage various use cases. For example, a raw zone can support the data and the types of enlarged files such as unstructured data, while an organized zone applies a stricter diagram and quality requirements. This configuration allows flexibility while maintaining governance and quality of the data. Consumers can use these areas for activities such as creating personal spaces for experimentation or collaborative areas for team projects.
Automated services guarantee access to data, life cycle management and compliance, which allows users to innovate with confidence and speed.
Direct
Effective AI strategies are based on robust and well -designed data ecosystems. By simplifying the way you produce and consume data – and improving the quality of said data – companies can allow users to innovate in new areas of conduct in confidence.
As a basis, it is essential that companies prioritize ecosystems and processes that improve reliability and accessibility. By implementing the principles described above, they can do exactly that – the management of evolutionary and enforceable data which will feed a rapid experiment in AI and will eventually offer commercial value in the long term.
Marty Andolino is VP, software engineering at Capital One
Kajal Wood is director of SR., software engineering at Capital One
The content of VB Lab Insights is created in collaboration with a company that pays the position or has a commercial relationship with Venturebeat, and they are always clearly marked. For more information, contact