{"id":89438,"date":"2023-05-03T10:15:00","date_gmt":"2023-05-03T10:15:00","guid":{"rendered":"https:\/\/cloudnewshub.com\/?p=89438"},"modified":"2023-05-03T10:15:00","modified_gmt":"2023-05-03T10:15:00","slug":"data-classification-tools-what-they-do-and-who-makes-them","status":"publish","type":"post","link":"https:\/\/cloudnewshub.com\/?p=89438","title":{"rendered":"Data classification tools: What they do and who makes them"},"content":{"rendered":"<div><img decoding=\"async\" src=\"http:\/\/cloudnewshub.com\/wp-content\/uploads\/2023\/05\/data-classification-tools-what-they-do-and-who-makes-them.jpg\" class=\"ff-og-image-inserted\"><\/div>\n<p>Data classification is an essential pre-requisite to <a href=\"https:\/\/www.computerweekly.com\/resources\/Data-protection-regulations-and-compliance\">data protection, security and compliance<\/a>. Firms need to know where their data is and the types of data they hold.<\/p>\n<p>Organisations also need to classify data to ensure it has <a href=\"https:\/\/www.techtarget.com\/searchdatabackup\/tip\/Use-data-classification-to-protect-data-aid-backup-compliance\">the right level of protection<\/a> and whether it is stored on the most suitable type of storage in terms of <a href=\"https:\/\/www.computerweekly.com\/feature\/Storage-performance-metrics-Five-key-areas-to-look-at\">cost and access time<\/a>.<\/p>\n<p>Data classification checks for <a href=\"https:\/\/www.techtarget.com\/searchsecurity\/definition\/personally-identifiable-information-PII\">personally identifiable information (PII)<\/a>. It may also classify intellectual property or sensitive financial and strategy information. Also, data classification will provide basic information such as data format, when last accessed, access controls, etc. Finally, data classification will often form part of large-scale analytics work, such as in data lakes.<\/p>\n<p>\u201cThe idea of a classification scheme is to be able to qualify the sensitivity or the importance of data to an organisation,\u201d says David Adams, GRC security consultant at Prism Infosec. \u201cApplying meaningful data classification allows an organisation to be able to understand its sensitive data and apply appropriate controls.\u201d<\/p>\n<section class=\"section main-article-chapter\" data-menu-title=\"Data classification and data management\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Data classification and data management<\/h3>\n<p>Increasingly, organisations have invested in dedicated tools to classify datasets as they are ingested, as well as to scan stored data for sensitive information and to create <a href=\"https:\/\/www.techtarget.com\/searchdatamanagement\/feature\/16-top-data-catalog-software-tools-to-consider-using\">data catalogues<\/a> and business glossaries. These, in turn, help with security, data management and data quality. This tools-based approach is replacing the custom scripts that enterprises have often relied on for <a href=\"https:\/\/www.techtarget.com\/searchdatamanagement\/post\/Successful-data-analytics-starts-with-the-discovery-process\">data discovery<\/a>.<\/p>\n<p>Suppliers have also turned to natural language-based systems to make data management easier for non-specialists, and to automation via machine learning and artificial intelligence (AI). This is in response to the growing volumes of data that organisations need to process, and the growth in <a href=\"https:\/\/www.computerweekly.com\/feature\/Unstructured-data-and-the-storage-it-needs\">unstructured data<\/a>.<\/p>\n<p>But it is also a response to compliance pressures. Automated systems are less prone to human error, and can be invaluable in tracking down incorrectly classified or inadequately protected datasets.<\/p>\n<p>Gartner points out that manual data classification is cumbersome and prone to inconsistencies. And the growth of data volumes, alongside greater use of unstructured data, is making it almost impossible to carry out the task manually.<\/p>\n<p>But data classification is critical for IT strategy, governance and compliance, and also for a business\u2019s risk tolerance. If an organisation lacks an accurate record of its data, it will not have an accurate view of its risk. This can leave critical data sources unprotected or, as Gartner warns, can result in \u201cover-classification\u201d of data and an unnecessary burden on the organisation.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Tools or platforms?\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Tools or platforms?<\/h3>\n<p>Data classification tools come as standalone \u2013 typically data cataloguing \u2013 products, or as part of broader data quality or data management toolsets. Also, they can form part of a business intelligence (BI) or enterprise software application.<\/p>\n<p>Some suppliers, including Microsoft and SAP, provide data classification as a service. Also, there is a trend towards \u201cserverless\u201d offerings from other suppliers that remove the need for users to configure IT infrastructure. This is especially useful for cloud-based workloads, but is not restricted to them<\/p>\n<p>Most suppliers claim at least some machine learning (ML) or AI capabilities to automate the data classification process. Some also provide data classification as part of a broader data quality toolset.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Tools round-up\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Tools round-up<\/h3>\n<p>Providers of data classification tools include business analytics suppliers, database and infrastructure companies, application software suppliers, cloud providers and niche specialists. There are also several open source options.<\/p>\n<p>Unsurprisingly, IBM, Microsoft, Oracle and SAP all have a presence in the market.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"IBM\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>IBM<\/h3>\n<p>IBM\u2019s Watson Knowledge Catalog works with the vendor\u2019s InfoSphere Information Governance Catalog for data discovery and governance. It has more than 30 connectors to other applications, uses a common business glossary, and was designed to use AI and ML.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Microsoft\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Microsoft<\/h3>\n<p>Microsoft\u2019s Purview Data Catalog also uses an enterprise data catalogue, and is part of the Purview data governance, compliance and risk management service Microsoft offers though its Azure cloud platform.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"SAP\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>SAP<\/h3>\n<p>SAP offers document classification as a service through its cloud operations or as part of its AI business services. It also has an AI-powered Data Attribute Recommendation service to automatically classify master data.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Oracle\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Oracle<\/h3>\n<p>Oracle offers its Cloud Infrastructure Data Catalog to provide a metadata management cloud service to build an inventory of assets and a business glossary. It includes AI technology as well as discovery capabilities.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Informatica\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Informatica<\/h3>\n<p>Data management supplier Informatica offers its Enterprise Data Catalog tool. This is an ML-based tool that can scan data and classify it across local and cloud storage. It also works with BI tools and third-party metadata catalogues.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Qlik\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Qlik<\/h3>\n<p>Analytics and BI company Qlik has built up its data classification tools in recent years, including via its acquisition of <a href=\"https:\/\/www.techtarget.com\/searchbusinessanalytics\/news\/252445579\/Qlik-Podium-acquisition-aims-to-boost-BI-data-management?_gl=1*1f2sx5t*_ga*NTY5MjkxNDMyLjE2ODE2NjYwMDY.*_ga_TQKE4GS5P9*MTY4MTY2NjAwNS4xLjEuMTY4MTY2NjAxNy4wLjAuMA..*_ga_RZDF13FDNT*MTY4MTY2NjAwNS4xLjEuMTY4MTY2NjAxNy4wLjAuMA..*_ga_NLDTRJGG3Y*MTY4MTY2NjAwNS4xLjEuMTY4MTY2NjAxNy4wLjAuMA..*_ga_H4TNQB84WS*MTY4MTY2NjAwNS4xLjEuMTY4MTY2NjAxNy4wLjAuMA..*_ga_7FK328ZGNW*MTY4MTY2NjAwNS4xLjEuMTY4MTY2NjAxNy4wLjAuMA..&amp;_ga=2.5402160.2089199953.1681666006-569291432.1681666006\">Podium<\/a> which added data preparation, quality and management tools. The data cataloguing part of Qlik\u2019s Data Integration platform aims to work closely with its BI and analytics tools, but can also exchange data with other applications and catalogues.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Tableau\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Tableau<\/h3>\n<p>Tableau takes a similar approach, putting its Catalog tool in its data management suite. This is an add-on to its analytics platform. The tool ingests information from Tableau datasets into its catalogue, and offers application programming interfaces (APIs) that can bring in data from other applications.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Google\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Google<\/h3>\n<p>Google\u2019s Cloud Data Catalog, despite its name, is a managed data discovery service that works across cloud and on-premise data stores. It integrates with Google\u2019s identity and access management and data loss prevention tools, and is \u201cserverless\u201d so users do not have to configure infrastructure.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Amazon Web Services\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Amazon Web Services<\/h3>\n<p>AWS provides its data catalogue through Glue, a managed ETL (extract, transform and load) service. Glue Data Catalog works across a range of AWS services, including AWS Lake Formation, as well as with open source Apache Hive data warehouses.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Ataccama\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Ataccama<\/h3>\n<p>Ataccama One is the supplier\u2019s data management and governance platform, and features in Gartner\u2019s Magic Quadrant for data quality solutions. Its Data Catalog module automates data discovery and change detection and works with databases, data lakes and file systems. The supplier\u2019s emphasis is on data quality improvement.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"Collibra\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>Collibra<\/h3>\n<p>Collibra is also rated by Gartner in its Magic Quadrant, and is a data intelligence cloud platform based around an ML-based data catalogue. The data catalogue has pre-built integration with business applications, BI and data stores. It claims users can search data stores using the tool, without the need to learn SQL.<\/p>\n<\/section>\n<section class=\"section main-article-chapter\" data-menu-title=\"DataHub and Apache Atlas\">\n<h3 class=\"section-title\"><i class=\"icon\" data-icon=\"1\"><\/i>DataHub and Apache Atlas<\/h3>\n<p>DataHub originated at LinkedIn as a metadata search and discovery tool, and went open source in 2020. But perhaps the most widely supported open source tool is Apache Atlas, which offers data cataloguing, metadata management and data governance.<\/p>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>Data classification is an essential pre-requisite to data protection, security and compliance. Firms need to know where their data is and the types of data they hold. Organisations also need to classify data to ensure it has the right level of protection and whether it is stored on the most suitable type of storage in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":89439,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[533],"tags":[],"class_list":["post-89438","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-it"],"_links":{"self":[{"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=\/wp\/v2\/posts\/89438","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=89438"}],"version-history":[{"count":0,"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=\/wp\/v2\/posts\/89438\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=\/wp\/v2\/media\/89439"}],"wp:attachment":[{"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=89438"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=89438"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudnewshub.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=89438"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}