Amplifying ClickHouse Capacity

Amplifying ClickHouse Capacity

Scaling Insights: ClickHouse's Multi-Tier Storage Innovation

Introduction

In today's data-centric landscape, efficiently managing and analyzing large datasets is essential. That's where ClickHouse steps in - a robust open-source database system built to handle massive data volumes with lightning-fast query speeds. But as data grows, scalability becomes crucial.

In this blog, we enter multi-tier storage - a smart approach to organizing data across different storage levels based on usage patterns. By categorizing data this way, organizations can cut storage costs, boost performance, and expand their data capabilities. We then see how multi-tier storage enhances ClickHouse's scalability and capacity, empowering organizations to maximize their data's value.

Understanding Multi-Tier Storage

In traditional storage setups, data usually sits on one type of storage, like speedy SSDs or budget-friendly HDDs. But for modern data tasks, this simple setup can fall short.

That's where multi-tier storage steps in. Instead of treating all data the same, it sorts data into different "tiers" based on how often it's used or how old it is.

The idea

To balance performance, cost, and scalability better. This means storing less-used or older data on cheaper storage like HDD or cold storage while keeping frequently used data on faster storage like SSD.

The Need for Scalability

In today's digital world, data is exploding from various sources like social media and IoT devices. To stay ahead, businesses need scalable data management solutions. Traditional methods, like monolithic storage, struggle to keep up with this data avalanche, leading to slow performance and high costs.

Organizations want to store more and longer data and query faster. But usually, the more data is stored, the higher the return latency will be for the same query conditions. As data keeps growing, organizations face a tough challenge: storing and analyzing huge amounts of data effectively and affordably. Without scalable solutions, they risk lagging behind competitors.

That's where ClickHouse's multi-tier storage comes in. It offers a scalable and cost-effective way to manage expanding data. With multi-tier storage, organizations can handle growing data volumes while keeping performance high and costs low.

By adopting multi-tier storage, businesses can overcome the limitations of traditional methods and boost their data analytics capabilities for future success.

Key Benefits

By leveraging ClickHouse's multi-tier storage feature, organizations can achieve several key benefits:

  • Cost Efficiency: Data is categorized based on how often it's accessed and retention policies. This optimizes storage costs by storing less frequently accessed data on cheaper storage tiers.

  • Performance Enhancement: ClickHouse ensures that frequently accessed data is stored on high-performance storage tiers. This results in quicker query responses and improved analytics performance.

  • Scalability: ClickHouse's multi-tier storage allows organizations to smoothly expand their data infrastructure by adding more storage tiers as needed. This ensures they can handle increasing data volumes without compromising performance or overspending.

  • Data Lifecycle Management: ClickHouse automates the movement of data between storage tiers, following predefined policies based on factors like access frequency, data age, and business importance. This streamlines data lifecycle management tasks for organizations.

Technical Deep Dive

Terms

  • Hot data: Frequently accessed or recently created data.

  • Cold data: Less frequently accessed data, which can be stored on a cold data disk to save storage costs and meet data access needs.

  • Table Engine: Type of data. It determines:

    • How and where data is stored, where to write it, and where to read it from.

    • Which queries are supported, and how?

    • Concurrent data access.

    • Use of indexes, if present.

    • Whether multithread request execution is possible.

    • Data replication parameters.

  • Storage Policy: Sets the rules that tell how data is written to the table. A policy organizes disks into volumes, determining the order in which data is written to disks within each volume and how data moves between volumes if there are multiple ones.

Architecture

The architecture of ClickHouse's multi-tier storage feature is designed to be flexible, robust, and easy to configure.

ClickHouse supports various table engines, such as MergeTree. In this blog, we will focus on the MergeTree table engine. In ClickHouse, each MergeTree table comes with a storage policy.

ClickHouse always includes a disk named default, which points to the data directory specified in the config.xml file. Correspondingly, there's a policy named default as well.

If a MergeTree table doesn't have a specific storage policy assigned to it, ClickHouse automatically applies the default policy and writes data to the default disk. This ensures that data storage follows a consistent and manageable approach across all tables in ClickHouse.

Experimentation

Implementing Hot-Cold architecture using 2 storage disks:

  • SSD (to store 1 week’s data)

  • HDD (to store older data)

ClickHouse Setup

  1. Install Docker and Ubuntu following https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-22-04

  2. Install ClickHouse on the Docker Container following https://clickhouse.com/docs/en/install

  3. Start the ClickHouse server and open the ClickHouse client

  4. Create a database: production_db using

     CREATE DATABASE production_db;
    
  5. Setup a table: health_records using

     CREATE TABLE production_db.health_records (
         patient_id UInt64,
         patient_name String,
         age UInt8,
         gender Enum8('Male' = 1, 'Female' = 2, 'Other' = 3),
         diagnosis String,
         date_of_visit Date,
         doctor_id UInt32,
         created_at DateTime DEFAULT now()
     ) ENGINE = MergeTree()
     PARTITION BY toYYYYMM(created_at)
     ORDER BY (patient_id, date_of_visit);
    
  6. Generate 100 rows using the below script

     from clickhouse_driver import Client
     from faker import Faker
     from datetime import datetime, timedelta
     import random
    
     client = Client('localhost', port=9000, user='default', password='password')
     fake = Faker()
    
     def generate_and_insert_data(num_rows):
         for _ in range(num_rows):
              patient_id = random.randint(1, 1000)
              patient_name = f"Patient_{patient_id}"
              age = 22
              gender = random.choice(['Male', 'Female', 'Other'])
              diagnosis = random.choice(['Fever', 'Cold', 'Injury', 'Diabetes', 'Hypertension'])
    
              date_of_visit = fake.date_time_between(start_date='-2y', end_date='now')
              created_at = fake.date_time_between(start_date='-2y', end_date='now')
    
              doctor_id = random.randint(1000, 2000)
    
              client.execute('INSERT INTO production_db.health_records VALUES', [{
                  'patient_id': patient_id,
                  'patient_name': patient_name,
                  'age': age,
                  'gender': gender,
                  'diagnosis': diagnosis,
                  'date_of_visit': date_of_visit,
                  'doctor_id': doctor_id,
                  'created_at': created_at
                  }])
    
     num_rows = 100
     generate_and_insert_data(num_rows)
    
  7. Verify the number of rows using

     SELECT COUNT(*) FROM health_records;
    

Defining disks and volumes using those disks

  • Create a SSD: mkdir /mnt/ssd/

  • Create an HDD: mkdir /mnt/hdd/

  • Give necessary permissions to ensure disk paths are writable for the ClickHouse user.

In a real-world scenario, these disks might need to be mounted as well.

Create a new file named storage.xml (or any file name) that defines your disks, then define volumes that use your disks. Place the XML file in /etc/clickhouse-server/config.d/ to have the configuration applied to your system:

<clickhouse>
        <storage_configuration>
                <disks>
                        <hot_disk>
                                <path>/mnt/ssd/</path>
                                <keep_free_space_bytes>5242880</keep_free_space_bytes>
                        </hot_disk>
                        <cold_disk>
                                <path>/mnt/hdd/</path>
                                <keep_free_space_bytes>5242880</keep_free_space_bytes>
                        </cold_disk>
                </disks>
                <policies>
                        <hot_cold_policy>
                                <volumes>
                                        <hot_volume>
                                                <disk>hot_disk</disk>
                                        </hot_volume>
                                        <cold_volume>
                                                <disk>cold_disk</disk>
                                        </cold_volume>
                                </volumes>
                        <perform_ttl_move_on_insert>false</perform_ttl_move_on_insert>
                        </hot_cold_policy>
                </policies>
        </storage_configuration>
</clickhouse>

Here, we have 2 disks that point to folders that ClickHouse can read from and write to. And, the policy name is hot_cold_policy.

Some configurations that might be useful:

  • max_data_part_size (UInt64): Maximum size of a data part that can be stored on volume disks (0 - no limit).

  • move_factor (Float64): Ratio of free disk space. When the ratio exceeds the value of the configuration parameter, ClickHouse starts to move data to the next volume in order. By default, it is 0.1.

  • perform_ttl_move_on_insert (UInt8): Value of the perform_ttl_move_on_insert setting. It disables TTL move on data part INSERT. By default, if we insert a data part that has already expired by the TTL move rule it immediately goes to a volume/disk declared in the move rule. This can significantly slow down insert in case the destination volume/disk is slow.

Verifying the disks

Restart the ClickHouse server.

SELECT
    name,
    path,
    free_space,
    total_space
FROM system.disks

Verifying the volumes

SELECT *
FROM system.storage_policies

Modifying Storage Policy for the table

ALTER TABLE production_db.health_records
    MODIFY SETTING storage_policy = 'hot_cold_policy'

Adding TTL rule to move data between hot and cold volumes

The data movement will happen after 7 days for that piece of data (i.e., expiration time for data = 7 days). The data movement happens based on created_at.

ALTER TABLE production_db.health_records
    MODIFY TTL created_at TO VOLUME 'hot_volume', created_at + toIntervalDay(7) TO VOLUME 'cold_volume'

Verifying if older data was moved to cold_disk after setting the TTL rule

SELECT
    name,
    disk_name
FROM system.parts
WHERE (table = 'health_records') AND (active = 1)

We can see that the older data has been moved to cold_disk, while the recent data is stored in hot_disk.

Adding old data

INSERT INTO production_db.health_records
    (patient_id, patient_name, age, gender, diagnosis, date_of_visit, doctor_id, created_at)
VALUES
    (1, 'John Doe', 35, 'Male', 'Fever', '2020-01-01', 1001, '2020-01-01 08:00:00'),
    (2, 'Jane Smith', 45, 'Female', 'Diabetes', '2019-10-15', 1002, '2019-10-15 10:30:00'),
    (3, 'Alice Johnson', 28, 'Female', 'Cold', '2018-05-20', 1003, '2018-05-20 14:45:00');

The data should have been moved to cold_disk. Verifying it.

Adding new data

INSERT INTO production_db.health_records
    (patient_id, patient_name, age, gender, diagnosis, date_of_visit, doctor_id, created_at)
VALUES
    (4, 'Michael Brown', 55, 'Male', 'Hypertension', '2024-05-01', 1004, NOW()),
    (5, 'Emily Wilson', 40, 'Female', 'Injury', '2024-05-15', 1005, NOW()),
    (6, 'David Miller', 32, 'Male', 'Cold', '2024-06-10', 1006, NOW());

The data should have been moved to hot_disk. Verifying it.

Auto-movement of data

What happens when some data gets old (in our case, 7 days is the expiry time for data)? Does it move to cold_disk automatically?

Add data that will become a week old in 5 minutes using

INSERT INTO production_db.health_records
    (patient_id, patient_name, age, gender, diagnosis, date_of_visit, doctor_id, created_at)
VALUES
    (13, 'Michael Brown', 45, 'Male', 'Fever', '2024-05-25', 1013, now() - INTERVAL 7 DAY + INTERVAL 5 MINUTE),
    (14, 'Sophia Garcia', 30, 'Female', 'Diabetes', '2024-05-20', 1014, now() - INTERVAL 7 DAY + INTERVAL 5 MINUTE),
    (15, 'James Lee', 35, 'Male', 'Cold', '2024-05-15', 1015, now() - INTERVAL 7 DAY + INTERVAL 5 MINUTE);

The data should be in hot_disk for now. Verifying it.

After 5 minutes, the data should have been moved to cold_disk. Verifying it.

Moving data on the existing server according to the new policy

If there's already some data in the table, then it's better to set materialize_ttl_after_modify to 0 before making the TTL change to avoid rewriting all table partitions, which can be a very expensive operation for large tables.

SET materialize_ttl_after_modify=0;

After the TTL-based tiered storage policy is modified, all existing and new data will be stored according to the new policy by default.

If you don't want the change to the TTL-based tiered storage policy to take effect for existing data, run the set materialize_ttl_after_modify=0; the statement first before modifying the policy. In this way, only new data will be stored according to the new policy.

After the TTL-based tiered storage policy is modified, data on the cold data disk will not be automatically moved to the hot data disk, but it can be manually moved.

Benefits

By setting materialize_ttl_after_modify=0 (before making the TTL change), you can avoid the costly rewriting of all the existing partitions, which can save a significant amount of time and resources, especially for large tables.

The new TTL rules will still be applied, but the actual materialization of the changes will happen gradually, as new data is inserted or existing data is queried. This can help maintain the performance of the system during the TTL change process.

Drawbacks

The main drawback of setting materialize_ttl_after_modify=0 is that the new TTL rules will not be immediately applied to the existing data. The old data will still be stored until it is eventually queried or new data is inserted, at which point the new TTL rules will be applied.

This means that the storage savings from the new TTL rules may not be immediately realized, and the old data will continue to occupy storage space until it is eventually removed.

Migrating data from the old server to a new server

Migrating data from an old server to a new server is a critical process that requires careful planning and execution to ensure a smooth transition without data loss or downtime. Let's say we have 10TB of data on the old server.

Here are possible ways to migrate the data:

  1. remoteSecure() function

    On the destination server, run

     INSERT INTO local_table 
         SELECT * FROM remoteSecure('server_a_hostname:9440', 'remote_database', 'remote_table', 'remote_user', 'remote_password');
    

    Pros

    • Simple to implement: Easy to run and understand.

    • Minimal downtime: This can be done without significant downtime if properly managed.

Cons

  • A huge amount of data can't be transferred this way.

  • If the connection breaks then there is no offset set to get further data.

  • Performance impact: This can put a high load on both servers (mostly on the receiver side) as it reestablishes the connection at each request.

  • Network bandwidth: Requires sufficient network bandwidth to transfer large volumes of data.

  • Duration: For 6 TB of data, this can take a considerable amount of time.

  • Might cause inconsistency in case data transfer fails at any point.

  • Time out issue: https://clickhouse.com/docs/knowledgebase/connection_timeout_remote_remoteSecure

    A workaround is to increase the connection timeout, by setting connect_timeout_with_failover_secure_ms to a higher value (e.g. 1 second) from the default 100ms.

  1. Using clickhouse-copier

    A demo: https://chistadata.com/clickhouse-copier-replicating-data/

    Pros

    • Optimized for large data transfers: Designed for bulk data transfer between ClickHouse clusters.

    • Resilient to failures: Can resume from the last processed chunk in case of failure.

    • Needs only access to Clickhouse TCP port.

    • Handles the case when data transfer fails at any point.

Cons

  • Setup complexity: Requires configuration and understanding of the tool. Also requires a Zookeeper.

  • Operational overhead: Needs monitoring and handling of potential issues during the transfer.

  • May create an inconsistent result if source cluster data is changing during the process.

  • High CPU/RAM usage (mostly on clickhouse-copier and receiver side).

  1. Using Backup and Restore

    Steps

    • Create a backup of the data on the old server using clickhouse-backup.

        clickhouse-backup create backup_name
      
    • Transfer the backup files to the new server.

    • Restore the backup on the new server using clickhouse-backup.

        clickhouse-backup restore backup_name
      

Pros

  • Reliable: Ensures an exact copy of the data.

  • Fast restore: Once the backup is transferred, the restore process is usually quick.

Cons

  • Downtime: Requires downtime during the backup and restore process.

  • Storage: Requires additional storage space for the backup files.

  • Transfer time: Transferring large backup files over the network can be time-consuming.

  1. Using Distributed table

    A distributed table is a kind of “view” of the local tables not storing any data of its own, but allowing distributed query processing on multiple servers.

    Pros

    • Possible to copy only some subset of data.

    • Needs only access to Clickhouse TCP port.

Cons

  • High CPU/RAM usage (mostly on the receiver side).
  1. Using Table ATTACH and File Copy

    Steps

    • Stop writing to the old server.

    • Copy the data directory from the old server to the new server (ensuring the same table structure).

    • Attach the tables on the new server.

        ATTACH TABLE new_server_db.table_name FROM '/path/to/data/directory';
      

Pros

  • Direct file copy: This can be faster for large datasets.

  • Data integrity: Ensures the exact state of the data is preserved.

Cons

  • Downtime: Requires stopping writes to ensure data consistency.

  • Manual effort: Requires precise file handling and directory structure management.

  1. Using a new replica

    Steps

    • Attach the destination server as a replica.

        ATTACH REPLICA 'new_server_name' FROM 'tcp://new_server_ip:new_server_port';
      

Pros

  • Simple to set up.

  • Data is consistent all the time automatically.

  • Low CPU and network usage should be tuned.

  • Also helpful for fault tolerance.

Cons

  • Some downtime is possible.

  • In case of cluster migration, zookeeper needs to be migrated too.

  1. Using rsync (transferring and synchronizing files)

    A demo: https://www.digitalocean.com/community/tutorials/how-to-use-rsync-to-sync-local-and-remote-directories

    Steps

    • Copy 10TB of data.

        rsync -avhzP --delete /var/lib/clickhouse/data/production_db/health_records/ root@remote_host:/var/lib/clickhouse/data/production_db/health_records/detached
      
    • Run again. It will copy let's say ~50GB that was changed during the first rsync.

        rsync -avhzP --delete /var/lib/clickhouse/data/production_db/health_records/ root@remote_host:/var/lib/clickhouse/data/production_db/health_records/detached
      
    • Run again. It will copy let's say ~5GB that was changed during the second rsync.

        rsync -avhzP --delete /var/lib/clickhouse/data/production_db/health_records/ root@remote_host:/var/lib/clickhouse/data/production_db/health_records/detached
      
    • Run again. It will copy let's say ~0GB that was changed during the third rsync.

        rsync -avhzP --delete /var/lib/clickhouse/data/production_db/health_records/ root@remote_host:/var/lib/clickhouse/data/production_db/health_records/detached
      
    • Stop the source ClickHouse.

    • Run again. It will copy let's say ~0GB that was changed during the fourth rsync.

        rsync -avhzP --delete /var/lib/clickhouse/data/production_db/health_records/ root@remote_host:/var/lib/clickhouse/data/production_db/health_records/detached
      
    • ATTACH partition to the table (ALTER TABLE ATTACH PARTITION).

    • Start the destination ClickHouse.

      Check that all data is in place by comparing sizes and start to use the destination ClickHouse.

Flags used

  • -h: output numbers in a human-readable format

  • -a: syncs recursively and preserves symbolic links, special and device files, modification times, groups, owners, and permissions

  • -z: compress file data during the transfer

  • -v: verbose

  • -P: shows progress

  • --delete: deletes files in the destination directory if they don't exist in the source directory.

NOTE

Pros

  • Low CPU/RAM usage

  • rsync can copy only deltas.

  • Ability to resume interrupted transfers

Cons

  • The same table schema is needed.

  • A lot of manual operations.

Real-World Use Cases

E-commerce Analytics

  • Scenario: An e-commerce company tracks customer behavior, sales transactions, and inventory levels.

  • Hot Data: Recent transactions, active user sessions, and real-time clickstream data are stored on high-performance SSDs for fast query performance and real-time analytics.

  • Cold Data: Historical sales data, older customer interaction records, and long-term inventory history are moved to more cost-effective cold storage.

  • Benefit: This setup ensures that recent and frequently accessed data is quickly available for immediate analysis and decision-making, while older data is preserved cost-effectively for trend analysis and reporting.

Financial Services

  • Scenario: A financial institution monitors trading activities, market data, and customer transactions.

    Hot Data: Current market data, recent trading activities, and real-time fraud detection data are stored on high-performance storage for low-latency access and rapid processing.

    Cold Data: Historical transaction records, regulatory compliance data, and older market analytics are stored in cold storage.

    Benefit: Ensures immediate access to current and time-sensitive financial data for trading and compliance while keeping older records accessible for audits and historical analysis at reduced costs.

Healthcare Analytics

  • Scenario: A healthcare provider handles patient records, medical imaging, and operational data.

  • Hot Data: Recent patient visits, current medical imaging, and active operational data are stored on high-performance storage for immediate access by healthcare professionals.

  • Cold Data: Historical patient records, past imaging, and older operational data are stored in cold storage.

  • Benefit: Ensures quick access to current patient information for timely medical care while preserving older data for longitudinal studies and regulatory requirements.

Conclusion

In conclusion, ClickHouse's multi-tier storage feature revolutionizes data management and analysis. It addresses the challenges of handling massive data volumes efficiently while reducing costs and enhancing performance.

Real-world use cases across industries underscore its versatility and effectiveness. By leveraging multi-tier storage, organizations can drive innovation and gain a competitive edge in the data-driven economy. Moreover, ClickHouse offers migration strategies tailored to different needs, ensuring seamless transitions and optimal data handling.

References

https://clickhouse.com/docs/en/guides/developer/ttl#implementing-a-hotwarmcold-architecture

https://www.tencentcloud.com/document/product/1129/44423

https://github.com/Altinity/clickhouse-backup

https://clickhouse.com/docs/en/operations/backup

https://altinity.com/blog/2018-8-22-clickhouse-copier-in-practice

https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-data-migration/