Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

Statistics and AI challenge

I need to build model(s) that predict sales per month given a best-seller ranking of a product.

That challenge lies in the data. For a certain category, say 'baby', there are many subcategories (toys, clothes, etc), and each of those subcategories have their own subcategories. It is a tree structure. Among sibling nodes, a product in its best seller lists are unique. In other words, there are no duplicate products among siblings. For each category I only have a list of the 1000 best-selling items although there could be many more items in the category. I am not able to see these other items beyond the 1000 in a category. A subcategory's 1000 item list may have some of its items overlap with the parent's list but it can also have other items (for example the parent's 1001st item which we are not able to see in the parent's list).

There are 202 categories in total in the 'baby' category tree.

Getting sales data per month of a product (to 'track' a product) is an expensive operation so I would like to minimize the number of items to track. I can track any product.

Best-seller-rank to monthly sales has a logarithmic relationship when plotted on a graph.

How do I come up with a way to estimate monthly sales of any product given a product's best-selling ranking? How do I do the above with the minimum number of 'tracked' products such that the prediction is still reasonable? What is the approximate number of items that need to be tracked?

Comments