Update README.md

This commit is contained in:
Lucky 2023-08-29 17:35:54 -03:00 committed by GitHub
parent 96dd5f7e84
commit a5cab089a2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -14,6 +14,10 @@ Consider doing your own data analysis. If you save your CSV, and make a pull req
- Sentiment analysis is also performed.
- **X number of "posts by this ID" with graphical representation.**
**Limitations**
- 4chan will recycle poster IDs, so they are not unique identifiers. Therefore, the data mining on n-pbtid isn't fully accurate for the upper-bound, but it should be closer to representative at the lower-bound.
- Time of day scrapes based on time-zone of interest (e.g. New York posting hours) hasn't been implemented, but this can be easily solved by scraping the threads from 0900h - 1700h by intervals of 3 hours to allow old threads to die, for your target local time.
# html_text vs html_text2 from rvest
I did an experiment comparing the `tidy_pol_fixed2` output of text.