<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: TFIDF In Libraries: Part I of III (For Librarians)	</title>
	<atom:link href="./index.html" rel="self" type="application/rss+xml" />
	<link>./../index.html</link>
	<description>Artist- and Librarian-At-Large</description>
	<lastBuildDate>
	Sat, 04 Jun 2016 18:04:58 +0000	</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.1.8</generator>
			<item>
				<title>
				By: Infomotions Mini-Musings &#187; Blog Archive &#187; Great Ideas Coefficient / Eric Lease Morgan				</title>
				<link>./../comment-page-1/index.html#comment-1617</link>
		<dc:creator><![CDATA[Infomotions Mini-Musings &#187; Blog Archive &#187; Great Ideas Coefficient / Eric Lease Morgan]]></dc:creator>
		<pubDate>Sat, 27 Mar 2010 11:58:10 +0000</pubDate>
		<guid isPermaLink="false">./../../../../index.html?p=258#comment-1617</guid>
					<description><![CDATA[[...] they mentioned the &#8220;great ideas&#8221;. Such a thing can be done through the application of TFIDF. Here&#8217;s [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] they mentioned the &#8220;great ideas&#8221;. Such a thing can be done through the application of TFIDF. Here&#8217;s [&#8230;]</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				By: Infomotions Mini-Musings &#187; Blog Archive &#187; Automatic metadata generation / Eric Lease Morgan				</title>
				<link>./../comment-page-1/index.html#comment-1133</link>
		<dc:creator><![CDATA[Infomotions Mini-Musings &#187; Blog Archive &#187; Automatic metadata generation / Eric Lease Morgan]]></dc:creator>
		<pubDate>Fri, 31 Jul 2009 02:22:06 +0000</pubDate>
		<guid isPermaLink="false">./../../../../index.html?p=258#comment-1133</guid>
					<description><![CDATA[[...] but not extraordinarily well. I then learned about Term Frequency Inverse Document Frequency (TFIDF) to calculate &#8220;relevance&#8221;, and T-Score to calculate the probability of two words [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] but not extraordinarily well. I then learned about Term Frequency Inverse Document Frequency (TFIDF) to calculate &#8220;relevance&#8221;, and T-Score to calculate the probability of two words [&#8230;]</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				By: Infomotions Mini-Musings &#187; Blog Archive &#187; Text mining: Books and Perl modules / Eric Lease Morgan				</title>
				<link>./../comment-page-1/index.html#comment-1051</link>
		<dc:creator><![CDATA[Infomotions Mini-Musings &#187; Blog Archive &#187; Text mining: Books and Perl modules / Eric Lease Morgan]]></dc:creator>
		<pubDate>Thu, 04 Jun 2009 02:14:58 +0000</pubDate>
		<guid isPermaLink="false">./../../../../index.html?p=258#comment-1051</guid>
					<description><![CDATA[[...] my explorations of term frequency/inverse document frequency (TFIDF) I became aware of a relatively new field of study called text mining. In many ways, text mining is [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] my explorations of term frequency/inverse document frequency (TFIDF) I became aware of a relatively new field of study called text mining. In many ways, text mining is [&#8230;]</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				By: Infomotions Mini-Musings &#187; Blog Archive &#187; TFIDF In Libraries: Part III of III (For thinkers) / Eric Lease Morgan				</title>
				<link>./../comment-page-1/index.html#comment-1039</link>
		<dc:creator><![CDATA[Infomotions Mini-Musings &#187; Blog Archive &#187; TFIDF In Libraries: Part III of III (For thinkers) / Eric Lease Morgan]]></dc:creator>
		<pubDate>Sun, 31 May 2009 20:30:42 +0000</pubDate>
		<guid isPermaLink="false">./../../../../index.html?p=258#comment-1039</guid>
					<description><![CDATA[[...] is the third of the three-part series on the topic of TFIDF in libraries. In Part I the why&#8217;s and wherefore&#8217;s of TFIDF were outlined. In Part II TFIDF subroutines and [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] is the third of the three-part series on the topic of TFIDF in libraries. In Part I the why&#8217;s and wherefore&#8217;s of TFIDF were outlined. In Part II TFIDF subroutines and [&#8230;]</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				By: Infomotions Mini-Musings &#187; Blog Archive &#187; TFIDF In Libraries: Part II of III (For thinkers) / Eric Lease Morgan				</title>
				<link>./../comment-page-1/index.html#comment-1037</link>
		<dc:creator><![CDATA[Infomotions Mini-Musings &#187; Blog Archive &#187; TFIDF In Libraries: Part II of III (For thinkers) / Eric Lease Morgan]]></dc:creator>
		<pubDate>Sun, 31 May 2009 20:28:25 +0000</pubDate>
		<guid isPermaLink="false">./../../../../index.html?p=258#comment-1037</guid>
					<description><![CDATA[[...] is the third of the three-part series on the topic of TFIDF in libraries. In Part I the why&#8217;s and wherefore&#8217;s of TFIDF were outlined. In Part II TFIDF subroutines and [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] is the third of the three-part series on the topic of TFIDF in libraries. In Part I the why&#8217;s and wherefore&#8217;s of TFIDF were outlined. In Part II TFIDF subroutines and [&#8230;]</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				By: Infomotions Mini-Musings &#187; Blog Archive &#187; TFIDF In Libraries: Part II of III (For programmers) / Eric Lease Morgan				</title>
				<link>./../comment-page-1/index.html#comment-992</link>
		<dc:creator><![CDATA[Infomotions Mini-Musings &#187; Blog Archive &#187; TFIDF In Libraries: Part II of III (For programmers) / Eric Lease Morgan]]></dc:creator>
		<pubDate>Tue, 21 Apr 2009 02:42:42 +0000</pubDate>
		<guid isPermaLink="false">./../../../../index.html?p=258#comment-992</guid>
					<description><![CDATA[[...] where relevancy ranking techniques are explored through a set of simple Perl programs. In Part I relevancy ranking was introduced and explained. In Part III additional word/document weighting [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] where relevancy ranking techniques are explored through a set of simple Perl programs. In Part I relevancy ranking was introduced and explained. In Part III additional word/document weighting [&#8230;]</p>
]]></content:encoded>
						</item>
						<item>
				<title>
				By: egarcia				</title>
				<link>./../comment-page-1/index.html#comment-983</link>
		<dc:creator><![CDATA[egarcia]]></dc:creator>
		<pubDate>Tue, 14 Apr 2009 15:58:58 +0000</pubDate>
		<guid isPermaLink="false">./../../../../index.html?p=258#comment-983</guid>
					<description><![CDATA[Hi, there:

I read with interest your article. Here are few points worth to mention:

1. IDF is defined as log(D/d) where D is number of documents in a collection and d is the number of documents mentioning a given term, regardless if the documents are relevant to said term. The base of the log does not matter (it can be base 10, 2, etc). The reason for taking logs is because most scoring functions in IR are assumed to be additive and because terms are assumed independent form one another (even when often this is not exactly the case).

2. IDF is a measure of the discriminatory power of a term (term specificity), but it does not relevancy. Indeed, IDF is a term weight score in the absence of relevance information.

3. IDF is a small pixel in the bigger picture of Robertson-Sparck Jones Probabilistic Model (RSJ-PM). A tutorial on the RSJ-PM Model explaining this model is available at http://www.miislita.com./

4. With unstructured, unfocused, and generic collections at the scale of the Web (e.g. commercial search engines like Google), the stability of IDF and this as a reliable scoring function has been put into question by several authors.

Regards

Dr. Edel Garcia
http://www.miislita.com]]>/</description>
		<content:encoded><![CDATA[<p>Hi, there:</p>
<p>I read with interest your article. Here are few points worth to mention:</p>
<p>1. IDF is defined as log(D/d) where D is number of documents in a collection and d is the number of documents mentioning a given term, regardless if the documents are relevant to said term. The base of the log does not matter (it can be base 10, 2, etc). The reason for taking logs is because most scoring functions in IR are assumed to be additive and because terms are assumed independent form one another (even when often this is not exactly the case).</p>
<p>2. IDF is a measure of the discriminatory power of a term (term specificity), but it does not relevancy. Indeed, IDF is a term weight score in the absence of relevance information.</p>
<p>3. IDF is a small pixel in the bigger picture of Robertson-Sparck Jones Probabilistic Model (RSJ-PM). A tutorial on the RSJ-PM Model explaining this model is available at <a href="http://www.miislita.com/" rel="nofollow">http://www.miislita.com/</a>.</p>
<p>4. With unstructured, unfocused, and generic collections at the scale of the Web (e.g. commercial search engines like Google), the stability of IDF and this as a reliable scoring function has been put into question by several authors.</p>
<p>Regards</p>
<p>Dr. Edel Garcia<br />
<a href="http://www.miislita.com/" rel="nofollow">http://www.miislita.com/</a></p>
]]></content:encoded>
						</item>
			</channel>
</rss>
