Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

Categorize scraped list of Wikipedia links using python

I have a long list (over 100k rows) of links scraped from Wikipedia currently in a csv. For example:

/wiki/Stacy_Jones_(baseball)
/wiki/Dre_Kirkpatrick
/wiki/University_of_Alabama_Crimson_Tide    
/wiki/Freddie_Kitchens  

I would like to remove all rows where the links do not go to a page about a person. For instance, in the list above, I would want to remove "University of Alabama Crimson Tide" because that is not a person.

Using python/pandas and/or the Wikipedia API, is there any way I can loop through my list and automatically remove all entries that aren't people? Any help is much appreciated.

Comments