Skip to content

Feature Request: Add expand method to explicitly set new difference values in Collections #617

@HarukiMoriarty

Description

@HarukiMoriarty

Is your feature request related to a problem? Please describe.

I'm currently using differential-dataflow (0.15.2) as part of a Datalog-based query engine. In our logic-oriented context, we frequently need to explicitly set difference values for records independently of their original differences.

Specifically, two scenarios arise frequently:

  • Set-difference queries (e.g., A - B): explicitly setting differences to +1 or -1 for tuples in the computation.
  • Exact counting (cardinality): where original differences must be ignored, and each tuple must explicitly be assigned a difference of exactly 1.

Currently, the built-in method explode multiplies provided differences by the original tuple's differences, which does not match our intended semantics when original differences vary.

Describe the solution you'd like

I'd like to propose adding a method named expand, which explicitly sets new differences, independent of original tuple differences:

/// Replaces each record explicitly with another, with a specified difference type.
///
/// Unlike `explode`, the new difference is not multiplied by the original difference.
pub fn expand<D2, R2, I, L>(&self, mut logic: L) -> Collection<G, D2, R2>
where
    D2: Data,
    R2: Semigroup + 'static,
    I: IntoIterator<Item = (D2, R2)>,
    L: FnMut(D) -> I + 'static,
{
    self.inner
        .flat_map(move |(x, t, _)| logic(x).into_iter().map(move |(x, d2)| (x, t.clone(), d2)))
        .as_collection()
}

Describe alternatives you've considered

  1. Using the built-in explode method: Not suitable, as it multiplies differences.
  2. Using custom patches: Currently, we've patched our local version, but we'd prefer an official method in differential-dataflow for clarity, correctness, and ease of maintenance.

Additional context

A concrete Datalog-based example illustrating the need for expand:

// Represents Datalog's negation operation: Out(x) :- A(x), not B(x)
let difference = A.expand(|x| Some((x, 1)))
                  .concat(&B.expand(|x| Some((x, -1))))
                  .threshold_semigroup(|_, _, old| old.is_none().then_some(semiring_one()));

This clearly expresses "tuples in A but not in B," assigning explicit differences independent of original values, and can't be directly and correctly implemented with explode.

I believe this method is simple, intuitive, and complementary to the existing methods (map, flat_map, explode). I'd appreciate your consideration of this addition.

Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions