Skip to content

Import multiple zsh historys with overlapping content results in duplicate entries in database #2904

@gheift

Description

@gheift

Hello,

I started using atuin and thought why not import old zsh_history files from my backup.

The problem I encountered was due to my settings, in newer backups, the oldest entries were missing and therefore I had overlapping parts in two files which were not at the same line.

Older file:

: 1682777923:0;nvim config.h
: 1682777942:0;ls ../../../bin`
: 1682780638:0;strace -f -e exec make
: 1682781137:0;strace -f -e execve make

Newer file:

: 1682777942:0;ls ../../../bin
: 1682780638:0;strace -f -e exec make
: 1682781137:0;strace -f -e execve make`
: 1682782438:0;man strace

When importing both files, due to the different line position the same entries are imported with a different timestamp.

Every new line the counter is increased by one:

And this counter is then added as microsecond to the timestamp:

+ time::Duration::milliseconds(counter);

So the ls ../../../bin is once imported with the timestamp 1682777942.0001 and once with 1682777942.0000.

I understand the counter is to keep the order of multiple entries with the same time.

But in my opinion the counter should be reset if the timestamp is larger than the timestamp of the last line.

Another problem I have now: my history database has approximately 800k entries. And in wsl2, it takes about 12 seconds to show the history and each key press takes 2s (or more) to update the filtered history.

I have a query to remove those duplicated lines without removing duplicates with different timestamps. But I do not know, how to synchronize those deleted entries to my server and other workstations:

with keep as (
    select count(id) as c, min(id) as min_id, command, cwd, duration, exit, timestamp
    from history where duration = 0 group by command, cwd, duration, exit, timestamp/1000000000
    having count(id) > 1
), remove_ids AS (
    select history.id
        from history
            join keep on 
                history.command = keep.command
                and history.duration = 0
                and history.exit = keep.exit
                and history.id != keep.min_id
                and history.timestamp / 1000000000 = keep.timestamp / 1000000000
)
delete from history 
    where id in (select id from remove_ids);

Can someone give me a hint how I can synchronize these deleted records?

Thanks
Gerhard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions