I have a long list (over 100k rows) of links scraped from Wikipedia currently in a csv. For example:
/wiki/Stacy_Jones_(baseball)
/wiki/Dre_Kirkpatrick
/wiki/University_of_Alabama_Crimson_Tide
/wiki/Freddie_Kitchens
I would like to remove all rows where the links do not go to a page about a person. For instance, in the list above, I would want to remove "University of Alabama Crimson Tide" because that is not a person.
Using python/pandas and/or the Wikipedia API, is there any way I can loop through my list and automatically remove all entries that aren't people? Any help is much appreciated.
Comments
Post a Comment