Skip to content

Performance issues with large amounts of dynamic data #1

@sotex

Description

@sotex

This is really a great project. It is very convenient to use and the performance is very good.
But I'm having some problems with it and hope to get help here.

Question 1

I have a large amount of geojson object data in my program. In order to use jmespath for operation, I had to combine it into a large array.
Similar to the following code:

   // These geojson object data are located in a large map, which is dynamically added and deleted
   // std::map<std::string,jp::Json> mydata; 

   // When I need to perform jmespath operation
   std::vector<jp::Json> vec;
   vec.reserve( mydata.size() );
   for( auto& kvpair : mydata ) {
      vec.push_back(kvpair.second);
   }
   jp::Json data = {
            {"data",std::move(vec)}
        };
   jp::Expression expr = "avg(data[properties.area<`100`].properties.area)";  // Simple example, not fixed
  auto result = jp::search(expr, data);

I can change mydata directly to use jp::Json array object storage to avoid conversion every time.
However, I wonder if there is a better way?

Question 2

Because I have a large number of data, I test the filtering operation of 10000 objects, it takes about 0.34 s.But I have more than 200,000 objects.
test environment:

OS : Linux x-mini 5.3.0-45-generic
CPU : Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz x4
MEM : 8G ddr3 1333`
Compiler and compilation options: g++10.0 use -O2

I can use multi-threading for parallel filtering, but it will get multiple results, which requires secondary processing.
I want to know if there is any good way to do it without secondary processing? Tanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions