In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014 (2014) Grbovic, M., Halawi, G., Karnin, Z., Maarek, Y.: How many folders do you really need? classifying email into a handful of categories. In: International Conference on Data Engineering, ICDE 2015 (2015) Hua, W., Wang, Z., Wang, H., Zheng, K., Zhou, X.: Short text understanding through lexical-semantic analysis. In: Proceedings of the International Conference on World Wide Web, WWW 2017, Perth, Australia (2017) Proskurnia, J., Cartright, M.-A., Garcia-Pueyo, L., Krka, I.: Template induction over unstructured email corpora. In: Proceedings of the 21st ACM International Conference on Knowledge Discovery and Data Mining, KDD 2015 (2015) Zhang, W., Ahmed, A., Yang, J., Josifovski, V., Smola, A.J.: Annotating needles in the haystack without looking: product information extraction from emails. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018 (2018) Sheng, Y., Tata, S., Wendt, J.B., Xie, J., Zhao, Q., Najork, M.: Anatomy of a privacy-safe large-scale information extraction system over email. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013 (2013) KeywordsĪilon, N., Karnin, Z.S., Liberty, E., Maarek, Y.: Threading machine generated email. Similarly, we present algorithms to minimize samples for human inspection to detect precision and recall gaps in the extraction pipeline. To handle the privacy and scalability issues, we focus on algorithms which can work with minimum human annotated samples for building classifier and extraction techniques. We describe end-to-end information extraction system for emails-data collection, anonymization, classification, building the information extraction models, deployment, and monitoring. Thus, adapting extraction techniques used for web pages, such as HTML wrapper-based techniques, have privacy and scalability challenges. Unlike web pages, emails are personal and due to privacy and legal considerations, no other human except the receiver can view them. In this paper we describe various algorithms related to extracting important information from these emails. Most of these emails are generated by filling a template with user or transaction specific values from databases. More than 60% of the email traffic constitutes business to consumer (B2C) emails (e.g., flight reservations, payment reminder, order confirmations, etc.). Email is the most frequently used web application for communication and collaboration due to its easy access, fast interactions, and convenient management.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |