I have api endpoint that takes as an input some data with "date_from" and "date_to" fields.
When the request is made it initiates the generation of report. These "date_from" and "date_to" fields are used to to generate (date_to - date_from).days
amount of subrequests. Based on each of these subrequests, hash is generated that is used either to get subresponse from redis database by hash or to make some equations and then save this subresponse to redis. In the end all subresponses are aggregated and returned as actual response.
I had a use case, where all the data is already stored in redis, but on long ranges of date_to and date_from it is still (date_to - date_from).days
amount of requests to the cache database.So I decided to also to store the final response of the request in redis in the same fashion by generating hash.
My problem is that these reports are generated regularly with the sliding window of date_from and date_to. For example, yesterday date_from = "2017-03-08" date_to = "2020-05-07"
but today it would be date_from = "2017-03-09" date_to = "2020-05-08"
. Which means that
- Most of report is cached, but the amount of days slows the process incredibly
- Very similar report was ready yesterday and can be accessed witihin seconds, but it is not full and there is no way to know that they are similar.
Here is my code
def generate_report(self, serialized_data): result = {'deviation' : [] } total_hash = hashlib.sha256(str(serialized_data).encode()).hexdigest() total_target = self.redis.get(total_hash) if not total_target: for date_from, date_to in self.date_range: serialized_data['media_company']['date_from'] = \ date_from.strftime("%Y-%m-%d") serialized_data['media_company']['date_to'] = \ date_to.strftime("%Y-%m-%d") hash = hashlib.sha256(str(serialized_data).encode()).hexdigest() target = self.redis.get(hash) media_company, context, validator = \ self.prepare_for_validation(serialized_data) if not target: target = validator.check({'media_company': media_company, **context}) self.redis.setex(hash, timedelta(days=180), json.dumps(target)) else: self.redis.expire(hash, timedelta(days=180)) target = json.loads(target) result['deviation'].append(target['deviation']) result['date'] = [str(date_to) for date_from, date_to in self.date_range] total_target = result self.redis.setex(total_hash, timedelta(days=180), json.dumps(total_target)) else: total_target = json.loads(total_target) return total_target
total_hash represents the hash of initial data,
self.date_range represents the array of date ranges for subrequests,
hash represents the hash for subqueries
Could you recommend a better way of caching data or may be ways to speed up this algorithm?