Mueez Khan

czv for CSV Operations

czv for CSV Operations

A collection of libraries for running CSV operations called czv. Built in Rust, available for Rust, Python, and WebAssembly (JavaScript/TypeScript).
Published: 2024-06-21
Last updated: 2024-06-21
side-project
rust
python
webassembly
javascript
typescript
react
data-engineering
data-analysis

A recent library I've been working on is czv as a side project built with Rust and including bindings for Python and WebAssembly (JavaScript/TypeScript).

Note: The code may be outdated/updated by the time you read this as this post was published when czv was released.

Examples

For the upcoming examples we'll be using the following CSV data:
fruit,price
apple,2.50
banana,3.00
strawberry,1.50
Since there are three libraries, there are three primary ways you may use czv.

Rust

Install the crate from crates.io/crates/czv by running:
cargo install czv
Now let's print out the row count (including the header row) for our CSV data:
use czv::{RowCount, Result};
 
fn main() -> Result<()> {
    let data: &str = "\
fruits,price
apple,2.50
banana,3.00
strawberry,1.50
";
    let output: usize = RowCount::new()
        .file_data(data)
        .include_header_row(true)
        .execute()?;
    println!("{output}"); // 4
    Ok(())
}
In the czv Rust library, the builder pattern is recommended. Here are some notes on this example:
  1. The RowCount is a Rust struct we consider as our CSV operation so we import it in the first line.
  2. We assign the CSV data to a variable as a &str.
  3. The RowCount::new() method must be called to generate a RowCountBuilder.
  4. The RowCountBuilder allows us to run multiple methods to pass conditional arguments. In this case we run .file_data(data) to provide our CSV data (alternatively for a file path use file_path(path)) and .include_header_row(true) to include the CSV data's first row in the total row count.
  5. The .execute() method is ran at the end to get the output as a Result<T, CzvError>. In this case T is a usize (representing the row count) and if .execute() runs successfully we get the value thanks to the question-mark operator and assign it to our variable output.
For further documentation on the czv Rust library, check out docs.rs/czv.

Python

Install the package from pypi.org/project/czv by running:
pip install czv
Personally I prefer using uv instead of pip.
Now let's print out the row count (including the header row) for our CSV data:
import czv
 
data: str = """fruits,price
apple,2.50
banana,3.00
strawberry,1.50"""
 
output: int = czv.row_count(file_data=data, include_header_row=True)
 
print(output) # 4
Similar to the czv Rust library's RowCount builder, we may use a row_count function directly in Python since we may provide optional parameters and keyword arguments whereas in Rust the builder pattern suffices for this necessity.

WebAssembly (JavaScript/TypeScript)

Install the package from npmjs.com/package/czv-wasm by running:
bun install czv-wasm
You may replace bun with the compatible package manager you use (e.g., npm, pnpm, yarn).
I received an error when trying to register czv as the package name since its name is too similar to csv, so we use czv-wasm instead.
How you use this library depends on the scenario it's used in as some frameworks make it easier to work with than others. For example I'll use czv-wasm within this page. My website is built with Next.js and this page uses MDX so first I'll try installing the czv-wasm package:
bun install czv-wasm
Then I'll make a React component based on the web demo example (which is built with Vite and React):
"use client";
 
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Switch } from "@/components/ui/switch";
import init, * as czv from "czv-wasm";
import React, { useState } from "react";
 
const CzvExample = () => {
    const [initDone, setInitDone] = useState(false);
    const [loading, setLoading] = useState(false);
    const [rowCount, setRowCount] = useState<number | undefined>(undefined);
    const [columnCount, setColumnCount] = useState<number | undefined>(
        undefined
    );
    const [includeHeaderRow, setIncludeHeaderRow] = useState(false);
 
    const handleSwitch = (e: boolean) => {
        setRowCount(undefined);
        setIncludeHeaderRow(e);
    };
 
    const handleFile = async (e: React.ChangeEvent<HTMLInputElement>) => {
        setLoading(true);
        // Reset counts for new file
        setRowCount(undefined);
        setColumnCount(undefined);
        // Running `init` first is required to use the czv library
        if (!initDone) {
            await init();
            setInitDone(true);
        }
        if (e.target.files && e.target.files.length === 1) {
            // Only allow a single CSV file as input
            if (e.target.files[0].type !== "text/csv") {
                setLoading(false);
                return;
            }
            const data = await e.target.files[0].text();
            const rowCountOutput = czv.rowCount({
                file_data: data,
                include_header_row: includeHeaderRow,
            });
            setRowCount(rowCountOutput);
            const columnCountOutput = czv.columnCount({
                file_data: data,
            });
            setColumnCount(columnCountOutput);
        }
        setLoading(false);
    };
 
    return (
        <>
            <div className="flex gap-2">
                <div>
                    <Label htmlFor="csv">Import your CSV file</Label>
                    <Input
                        className="cursor-pointer mb-2"
                        onChange={handleFile}
                        id="csv"
                        type="file"
                        accept="text/csv"
                    />
                </div>
                <div className="flex items-center space-x-2 mt-3">
                    <Switch
                        onCheckedChange={handleSwitch}
                        id="include-header-row"
                    />
                    <Label htmlFor="include-header-row">
                        {includeHeaderRow ? "Include" : "Exclude"} header row
                    </Label>
                </div>
            </div>
            {loading && <p>Loading...</p>}
            {rowCount && (
                <p>
                    <strong>
                        Row count (
                        {includeHeaderRow ? "including" : "excluding"} header
                        row)
                    </strong>
                    : {rowCount}
                </p>
            )}
            {columnCount && (
                <p>
                    <strong>Column count</strong>: {columnCount}
                </p>
            )}
        </>
    );
};
 
export default CzvExample;
The code could be refactored but I'm mainly trying to show the functionality. Since I'm using Next.js I add "use client" to render this on the client-side which allows useState to work. The component imports are from the shadcn/ui library. I'll add the component to this page's MDX file as so:
import CzvExample from "@/path/to/CzvExample";
 
<CzvExample />

Now you may try it out above! Import a CSV file using the file browser. You may also click the switch to toggle whether you want to include the header (first) row from your CSV file. The row count and column count should be computed by czv-wasm within your browser and displayed. If you'd like to explore more with czv-wasm, docstrings are provided for TypeScript.

Conclusion

At the time this post is published the row count and column count operations are the only two available operations. However there was a lot I've learned as I built these libraries especially since it's not just a library for a single language. Feel free to star the project on GitHub here and I encourage you to try it out.
Back to blog