Data Cataloging Pilot with Fusion HCI
Unlock the hidden value in your enterprise data by creating a unified, intelligent data catalog.

The Strategic Challenge: The Unstructured Data Dilemma
In the modern enterprise, unstructured data—including files, objects, documents, and logs—accounts for over 80% of all information and is growing at an explosive rate. This data is often fragmented across disparate storage silos, creating a formidable challenge. The inability to effectively manage, classify, and understand this information introduces significant business obstacles, including unnecessary costs, compliance and security risks, and a massively underutilized asset. Without a unified, intelligent view, this vast data landscape remains a costly black box, and a primary bottleneck for critical analytics and AI initiatives that are essential for a competitive advantage.
The Solution: An End-to-End Data Cataloging Platform
The Li9 Data Cataloging Pilot is a turnkey service designed to solve this challenge. We deploy and operationalize a complete, modern data cataloging platform, powered by the IBM Fusion HCI appliance. This container-native metadata management solution is engineered to provide deep data insight across exabyte-scale, heterogeneous storage environments.
Our expert-led engagement delivers a complete hardware and software stack that connects to your disparate data sources to rapidly ingest, consolidate, and index metadata for billions of files and objects. This creates a rich, searchable metadata layer that empowers your administrators, data stewards, and data scientists to efficiently manage, classify, and govern massive data stores from a single point of control.
Core Technology: A Container-Native, Service-Based Architecture
The platform we deliver is built on a robust, service-based architecture designed for enterprise scale and performance.
- IBM Fusion HCI System: The foundation is a pre-engineered appliance that provides the underlying Red Hat OpenShift cluster, high-performance storage, and redundant networking. This turnkey private cloud hosts the Data Cataloging service and ensures enterprise-grade reliability.
- Data Cataloging Service: A suite of containerized microservices running on OpenShift provides the tools for connecting to data sources, managing data policies, and visualizing metadata through an intuitive user interface.
- Active File Management (AFM) Nodes: The strategic value of this solution is significantly amplified by AFM nodes within the Fusion HCI appliance. These intelligent data gateways function as a bridge to your broader enterprise data landscape, including popular NFS filers and S3-compliant object stores. This allows the platform to index and catalog data in-place, without requiring disruptive, upfront data migrations.
- Metadata Processing Pipeline: When a data source is scanned, its metadata is ingested and processed in real-time. A powerful policy engine provides automated tagging and classification before the metadata is indexed and stored in a high-performance data warehouse, making it instantly searchable by your teams.
Deliverables
Upon completion of the Data Cataloging Pilot, you will receive:
- A Fully Deployed IBM Fusion HCI Appliance: A production-ready, enterprise-grade private cloud platform, professionally installed and configured in your data center.
- An Operational Data Cataloging Service: A validated and healthy installation of the Data Cataloging service, running on the Fusion HCI’s Red Hat OpenShift cluster.
- Connected Data Sources: The catalog will be connected to your key internal and external data sources, with initial metadata scans completed.
- Baseline Policies and Tags: A foundational set of custom policies and metadata tags, developed with your team, to begin automating data classification.
- Comprehensive Documentation: Detailed architectural diagrams, configuration guides, and operational runbooks for your new environment.
- Empowered Staff: Your team will receive hands-on knowledge transfer sessions, ensuring they are prepared to manage, use, and extend the data catalog.
The Li9 Implementation Journey: An Expert-Led, Phased Approach
We provide a structured, end-to-end engagement to ensure your success from day one.
Phase 1: Platform Deployment & Readiness Our engagement begins with a comprehensive assessment of your data center environment. Li9’s experts then manage the complete turnkey installation of the IBM Fusion HCI appliance, establishing a robust, enterprise-grade private cloud foundation that is optimized for data-intensive workloads.
Phase 2: Service Enablement & Validation Once the foundation is in place, our certified engineers lead the installation and validation of the Data Cataloging service onto the Fusion HCI platform. We conduct rigorous checks to ensure all components are healthy, integrated, and performing optimally before proceeding.
Phase 3: Configuration & Operationalization This is where value is unlocked. We guide your team through connecting the catalog to your key data sources, including external file and object stores via the integrated AFM nodes. We help you run the initial metadata scans and work with your data stewards to define a baseline set of custom policies and tags (e.g., tagging by project, data sensitivity, or retention period) to begin classifying your data immediately.
Phase 4: Knowledge Transfer & Handover A solution is only successful if your team is empowered to use it. We provide comprehensive training sessions and detailed documentation, ensuring your data administrators and stewards have the skills and confidence to manage, customize, and scale your new data catalog long-term.
The Business Value Proposition: Why Catalog Your Data?
This pilot delivers tangible business outcomes by turning your data from a liability into a strategic asset.
Optimize Storage and Reduce Costs
- Eliminate ROT Data: Identify and eliminate Redundant, Obsolete, and Trivial (ROT) data that consumes expensive storage capacity, directly increasing storage efficiency and reducing backup windows.
- Automate Data Tiering: Use policies to automatically move “cold” or less-frequently accessed data to lower-cost storage tiers, optimizing storage expenditures without manual intervention.
- Improve Administrator Productivity: By automating data discovery and management tasks, the platform reduces the manual effort required by storage administrators, freeing them for more strategic initiatives.
Accelerate Analytics and AI
- Accelerate Data Discovery: A primary bottleneck in AI and analytics projects is the time spent finding and preparing data. A centralized, searchable catalog allows data scientists to quickly pinpoint and activate the precise datasets they need, dramatically reducing data preparation time.
- Enable AI at Scale: For AI workloads, such as those using IBM watsonx, a well-curated catalog is essential for rapidly identifying relevant data for model training. The platform can integrate with enterprise catalogs and query engines to create a seamless data pipeline for AI projects.
- Uncover Hidden Value: By indexing and classifying all data, the platform helps uncover hidden data value and makes previously unknown datasets available for analysis, driving new business insights.
Enhance Governance and Mitigate Risk
- Identify and Classify Sensitive Data: The platform’s policy engine can be configured to automatically perform data inspection and classification, identifying and applying tags to sensitive data (e.g., PII, PHI) wherever it resides.
- Automate Governance Policies: Custom tags and policies help ensure that data is managed in compliance with internal and external governance mandates like GDPR, CCPA, and HIPAA.
- Reduce “Dark Data” Risk: By shining a light on massive, unmanaged data stores, the service helps reduce the inherent security and compliance risks associated with “dark data.”
Want to learn more?
Download our Data Catalogging brief for a deeper dive into the architecture and capabilities.