Tag: python

Using Stanford-Corenlp to get the Part of speech POS of Arabic Text, Python Example

Install :

$ pip install StanfordCoreNLP

And Download stanford-corenlp-4.1.0 and save it in your project’s folder

To Download more Arabic datasets go to Leipzig collection corporate website.


for the current example dataset

find the POS tag for each words in the text by using this format

word <space> tag <tap> word2 <space> tag2 …

from stanfordcorenlp import StanfordCoreNLP

def find_pos(xsent): 
    keepmyfinal =''
    with StanfordCoreNLP(r'stanford-corenlp-4.1.0', lang='ar') as nlp:
        Keepres = nlp.pos_tag(xsent)
        for k in Keepres:
            keepmyfinal += "{} {}\t".format( convert_ara_to_bw(k[0]),k[1])      
    return keepmyfinal

Let us get some result:

find_pos('ألا إنهم هم المفسدون ولكن لا يشعرون').rstrip()
>lA IN
<n IN
hm PRP
hm PRP
Almfsdwn DTNNS
w CC
lkn CC
y$Erwn VBP

Read The file and find words shares the same tag:

Read text file

    klist = []
    with open(p) as fword:
        klist = fword.read().splitlines()
    return klist

KeepQuran = []
loadquran = loadUnqList('sample_msa_fixed.fo')  

# Result

we can search for tags like NNP noun

search_Tag = 'NNP'
numres = 200

keepres = []
for i in loadquran:
    xx = i.split('\t')
    for i in xx:
        xi = i.split(' ')
        if xi[1] == search_Tag:

# Count the 
word frequency for each word

counts_nsw = collections.Counter(keepres)                        
clean_tweets_nsw = pd.DataFrame(counts_nsw.most_common(numres), columns=['words', 'count'])
similar_words=[i[0] for i in counts_nsw.most_common(numres)]

word_frequency = {}

# plot the result

for word_tuple in counts_nsw.most_common(numres):
    reshaped_word = arabic_reshaper.reshape(word_tuple[0])
    key = get_display(reshaped_word)
    word_frequency[key] = word_tuple[1]     

def plot_word_cloud(word_list: List[str], word_frequency: Dict[str, float]):
    full_string = ' '.join(word_list)
    reshaped_text = arabic_reshaper.reshape(full_string)
    translated_text = get_display(reshaped_text)   
    # Build the Arabic word cloud
    wordc = WordCloud(font_path='tahoma',background_color='white', width=800, height=300).generate(translated_text)

    plt.tight_layout(pad = 0)
    plt.title('Search in Quran Tags, By Faisal Alshargi')



3 بنت
1 عبدالله 3
2 بن 3
3 عبدالعزيز 3
4 آل 3
.. … …
56 جدة 1
57 أبو 1
58 أمريكا 1
59 أيار 1
60 سوريا 1

Search for past verbs in the text

search_Tag = 'VBD'
numres = 200

Search for present verbs in the text

search_Tag = ‘VBP’
numres = 200

{ Add a Comment }

Emotion from the text (positive, negative) python Example

Install :

$ pip install textblob 
from textblob import TextBlob

feedbacks = ['I dont like this app ', 
             "The experience was bad as hell", 
             "This app is really helpful and ",
             "Damn the app tastes like shit ",
            'Please don\'t download the app you will regret it ']

positive_feedbacks = []
negative_feedbacks = []

for feedback in feedbacks:
  feedback_polarity = TextBlob(feedback).sentiment.polarity
  if feedback_polarity>0:
print('Positive_feebacks Count : {}'.format(len(positive_feedbacks)))
print('positive ', positive_feedbacks)


print('Negative_feedback Count : {}'.format(len(negative_feedbacks)))
print('negative ', negative_feedbacks)

Positive_feebacks Count : 0
positive  []

Negative_feedback Count : 5
negative  ['I dont_heat you ', 'The experience was bad as hell', 'This app is really helpful and bad', 'Damn the app tastes like shit ', "Please don't download the app you will regret it "]

{ Add a Comment }